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Abstract 

The Static Single Information (SSI) form is a compiler intermediate 
representation that allows efficient sparse implementations of predicated 
analysis and backward dataflow algorithms. It possesses several attractive 
graph-theoretic properties which aid in program analysis. An extension to 
SSI form, SSI + , is also presented, along with a complete executable abstract 
semantics for the representation. Applications to abstract interpretation 
and hardware compilation are discussed. 

The SSI form has been implemented on the FLEX compiler infrastruc- 
ture, and it has been used to implement several analyses and optimizations. 
Details on these predicated analysis techniques are presented, as well as 
data from the practical implementation. 
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1 Introduction 



This paper introduces a compiler intermediate representation: Static Sin- 
gle Information (SSI) form. This IR is the core of the FLEX compiler 
project, which is primarily investigating intelligent compilation techniques 
for distributed systems. This thesis, in presenting the IR, attempts to keep 
both the mathematician and the programmer in mind. SSI form has both 
a rigorous mathematical semantics and a factored form which aids efficient 
implementation of advanced analyses. I believe that it effectively strad- 
dles the gap between dataflow-oriented, graph-structured, and control-flow 
driven IRs, while maintaining the sparsity needed to achieve practical effi- 
ciency. The construction algorithms are linear in the size of the program. 

Our discussion of the Static Single Information form will be at times 
tied to the source language of the FLEX compiler, Java. Unlike many ab- 
stract IRs, the choices made in the design of SSI form have been dictated by 
the necessities of compiling a real-world imperative language. Java, how- 
ever, has several theoretical properties that make program analysis more 
tractable. In particular, we mention here Java's strict constraints on pointer 
variables. Pointers in earlier languages such as C can be abused in many 
ways that Java disallows. 

Ultimately, the choice of compiler internal representation is fundamen- 
tal. Advances in IRs translate into advances in compilers. SSI form rep- 
resents a clean and simple unification of many extant ideas, and our hope 
is that it will allow the FLEX compiler to achieve a similar integration of 
practical implementation and mathematical elegance. 



2 Context and goals 

Strong et al. [40] 1 first advocated the use of compiler intermediate represen- 
tations in a 1958 committee report. Their idealistic "universal intermediate 
language" was called UNCOL. Thirty years later, the Static Single Assign- 
ment (SSA) form was introduced by Alpern, Rosen, Wegman and Zadeck as 
a tool for efficient optimization in a pair of POPL papers [2, 35], and three 
years after that Cytron and Ferrante joined Rosen, Wegman, and Zadeck 
in explaining how to compute SSA form efficiently in what has since be- 
come the "canonical" SSA paper [10]. Johnson and Pingali [20] trace the 
development of SSA form back to Shapiro and Saint in [37], while Havlak 
[17] views 4>-functions as descendants of the "birthpoints" introduced in 
[34]. 

Despite industry adoption of SSA form in production compilers [8, 9], 
academic research into alternative representations continues. Recent pro- 
posals have included Value Dependence Graphs [45], Program Dependence 
Webs [5], the Program Structure Tree [19], DJ graphs [39], and Depedence 
Flow Graphs [20]. 

In comparison to these representations, the dominant characteristics of 
our Static Single Information form may be summarized as follows: 

• It names information units. 

• It is complete. 

• It is simple. 

• It is efficient. 

• It has no explicit control dependencies. 



Attribution by Aho [1] 



• It supports both forward and reverse dataflow analyses. 

SSI form is used as an IR for the FLEX compiler for the Java programming 
language, which informs some of these design decisions. The FLEX com- 
piler does deep analysis and will support hardware/software co-design. SSI 
addresses these needs, concentrating on analysis rather than optimization. 
We will address each design point in turn. 

It names information units. SSA form (which we will describe fur- 
ther in section 4) assigns unique names to unique static values of a vari- 
able. However, it ignores the value information which may be added to a 
variable at program branch points. SSI form renames variable at branch 
points, which allows us to associate unique names with unique informa- 
tion about static values. For example, a program may test the value of an 
integer against zero before using it as a divisor. After the branch on the 
tested predicate, it is possible to make statements about values (regarding 
equality or inequality to zero) which were impossible to make previously. 
SSI form allows us to exploit this additional information. 

It is complete. By this we mean that there exists an executable se- 
mantics for the IR that does not require the use of information external to 
the IR. The original SSA form — and most derivatives — require use of the 
original program control flow graph during analysis, translation, or direct 
execution. In fact, ^-functions are intimately tied with the precise input 
edge structure of the control flow graph, and switch nodes (where control 
flow splits) are undecipherable without referring to the control flow graph. 

In practice, this seems not a great disadvantage — it merely forces us to 
maintain a mapping of SSA statements to nodes (equivalently, basic blocks) 
of the original control flow graph. But maintaining this correspondence 
complicates editing the IR. Also, it complicates the interpretation of the 
program as a set of simultaneous equations, which SSI form will allow us 



to do. Finally, explicit control flow may limit the available parallelism of 
the program. 

SSI+, as it will be presented in section 7, overcomes these difficulties 
and presents a complete representation of program meaning as a set of 
simultaneous equations, without resort to graph information. 

It is simple. A bestiary of new 4>-like functions have been introduced 
in the past decade, including |>, y-, and r|-functions in [5, 43], i\)- and n- 
functions in [24], interprocedural 4>-functions in [26], \i- and x- functions in 
[9], (> and r|-functions in [14], 2 and A-functions in [27], among others. Some 
of these are orthogonal to our work — the techniques of [24] can be used to 
extend SSI form to explicitly parallel source languages, and those of [9] 
to languages with local variable aliasing (absent in Java). Our goal is to 
achieve minimal conceptual complexity in SSI form; that is, to introduce 
the minimum set of cp-like functions necessary to represent the "interesting" 
properties of the compiled program. 

It is efficient. Construction of SSI form should be fast, and space 
requirements should be reasonable. The original SSA algorithms required 
0(E + V SSA |DF| + NV SSA ) time. 3 This bound was dominated by the time 
and space required to construct the dominance frontier, as |DF|, the size 
of the dominance frontier, could be 0(N 2 ) for common cases. Taking the 
dominant term, we abbreviate the time complexity of the Cytron's SSA- 
construction algorithm as 0(N 2 V). 

Our algorithms do not require the construction of a dominance frontier — 
building on recent work on efficient SSA construction in this regard — and 
run in so-called "linear" time. A more detailed analysis will be given in 
section 5.4, but suffice for now to say that our construction and analysis 



2 Compare to [5, 43]. 

3 See section 3 for definitions of the variables used in the complexity bounds of these 

two paragraphs. 
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algorithms are efficient. 4 

All explicit control dependencies are eliminated. Some researchers 
(including [4] and [32]) view control dependence as a fundamental prop- 
erty of the CFG, and [5, 4] suggest that accurate knowledge of control- 
dependence relations is the sole key to automatic parallelization. Of- 
ten, incomplete intermediate representations 5 are augmented with control- 
dependence edges to express proper program semantics — see [20] on DFGs 
and [45] on VDGs, for example. 

Unfortunately, explicit control-flow edges tend to serialize computation 
more than strictly necessary. Figure 7.1 on page 75, for example, contains 
two parallel loops which would be serialized by the explicit control depen- 
dency between them. Prior work often focused on fine-grain intra-loop par- 
allelism and ignored this coarser inter-loop parallelism. 6 Our objective in 
this work is to fully utilize coarse parallelism by removing source-language 
control-dependency artifacts. 

It is efficient for both forward and backward dataflow analyses. 
It is often observed that traditional SSA form cannot handle backward data- 
flow analysis. Johnson and Pingali note this, and suggest anticipatability 
as an example of a backwards dataflow analysis where their dependence 
flow graph representation betters SSA form [20]. Lo et al. suggest the use 
of an "SSU" form to address much the same issue [27]. There are in fact 
many analyses where both use and definition information is utilized, and 
where dataflow in both forward and reverse directions occurs. SSI form is 
able to handle both of these cases, as we demonstrate in section 6.1. 



4 Dhamdhere [12] quite correctly states that Cytron's original algorithm has a worst- 
case time bound of 0(N 3 ). This is also true for our algorithms. However, these worst-case 
time bounds are not tight; we will present experimental evidence that run times on real 

programs are O(N). 

5 See page 9 for our definition of "completeness" in an IR. 

6 We discuss the dataflow-architecture work of Traub [42] in particular in section 7.5. 
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3 Definitions 

We next provide some definitions. Our complexity metrics will usually be 
in terms of the following variables: 

N is the number of nodes in the program control flow graph. Each node 
represents either a single statement or a basic block; the difference is 
unimportant for complexity metrics. 

E is the number of edges in the program control flow graph. For most 
programs E is reasonably assumed to be O(N), since most nodes 
have either one or two successors (simple assignments and conditional 
branches, respectively). Unusual use of computed-goto and switch 
statements may invalidate this assumption; but in these cases E is 
generally a better metric of program "complexity" than N . For this 
reason, we will case 0(E) "linear in program size". 

V is the number of variables in the program. 

U is the total number of variable uses in the program. 

As the transformations we will describe split and rename variables, we will 
use subscripts to denote the number of variables, uses, or definitions in 
a particular transformed version of a program. For example, Ussa is the 
number of uses in the SSA form (see section 4) of a program. When it is 
necessary to explicitly denote a metric on the untransformed program, a 
zero subscript will be used; for example, V . 

Graphs will be directed unless specified otherwise. If X and Y are 
nodes in some graph G, an edge from X to Y is written X — > Y. A path 
X = s — > Si — > . . . — > s n = Y is written X A Y. A simple path is one in 
which all the nodes S{ in it are distinct. 
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Control-flow graphs are assumed to be connected, and to contain unique 
START and END nodes marking procedure entry and exit points, respectively. 
To ensure that graphs representing infinite loops are connected, an edge will 
typically exist between the START and END nodes. The presence of unique 
START and END nodes ensures that both the dominance and post-dominance 
relation define trees rooted at START and END, respectively. 

For simplicity, we will assume that every node in the control-flow graph 
with one successor and one predecessor contains exactly one statement. A 
node with no predecessors and a node with no successors (START and END) 
are empty; they contain no statements. Nodes with multiple successors or 
multiple predecessors are also empty for conventional program representa- 
tions, but may contain multiple 4>- or cr-function assignment statements in 
the SSA and SSI forms we will discuss. No node may contain both multiple 
predecessors and multiple successors. 

The symbol n will be used for the dataflow "meet" operator. The 
operator jZ is the partial ordering relation for a lattice, and x C y iff x |Z y 
and x ^ y. 

4 Static Single Assignment form 

Static Single Information (SSI) form derives many features from Static 
Single Assignment (SSA) form, as described by Cytron in [10]. To provide 
context for our definition of SSI form in section 5, we review SSA form. 

4.1 Definition of SSA form 

Static Single- Assignment form is a sparse program representation in which 
each variable has exactly one definition point. As a consequence, only one 
assignment can reach each use, which means that SSA form can be viewed 
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if P jump 





Y^6 + X 
Z^5 



Y^ Y + 1 
/* no further uses of X or Z */ 



P <- (X + 2) 
if Po jump 




Yi ^4 + X c 




Y 3 <-<MYi,Y 2 ) 

Z 2 <-<i>(Zo,Zl) 

Y 4 <- Y 3 + 1 
/* no further uses of X or Z */ 



Figure 4.1: A simple program (left) and its single assignment version 
(right). 



as a type of sparse def-use chain [1]. 

For straight-line code, the SSA transformation is straightforward: each 
assignment to a variable is given a unique name (conventionally indicated 
by the use of a subscripted version of the original variable name) and each 
use is renamed to match its reaching definition. Special 4> -functions must 
be inserted at join points to preserve the single-assignment property. These 
4>-functions have the form v <— 4>(vi,v 2 ) and perform an assignment ac- 
cording to the path by which control flow reaches the 4>-function. Figure 4.1 
shows a simple program and its SSA form; the ^-function Y3 <— 4>(Yi,Y2) 
in the SSA version on the right assigns Y 3 the value of Yi if control flow 
reaches it along the false branch of the if statement. If the true branch is 
taken, Y3 will get the value of Y2 at the 4>-function. 

Formally, a program is said to be in SSA form if the following three 
conditions hold: 
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1. If two nonnull paths X — > Z and Y — > Z converge at a node Z, and 
nodes X and Y contain assignments to [a variable] V (in the origi- 
nal program), then a trivial (^-function V <— 4>(V, . . . , V) has been 
inserted at Z (in the new program). 

2. Each mention of V in the original program or in an inserted 4>-function 
has been replaced by a mention of a new variable Vt, leaving the new 
program in SSA form. 

3. Along any control flow path, consider any use of a variable V (in 
the original program) and the corresponding use of Vt (in the new 
program). Then V and Vt have the same value. 

This formulation of this definition is due to Cytron et al. [11]. Note that 
the definition does not prohibit "extra" ^-functions not strictly required 
by condition 1. 

4.2 Minimal and pruned SSA forms 

Cytron et al. [11] defines minimal SSA form as an SSA form using the 
smallest number of 4>-functions such that the above three conditions hold. 
The SSA form in the previous example (Figure 4.1 on the facing page) is 
minimal. 

A variation on minimal SSA form, called pruned form, avoids placing 
4>-functions which define variables which are never used. The 4>-functions 
in pruned form are a subset of those in minimal form, and as such note that 
pruned form does not strictly satisfy the given SSA criteria. In most cases, 
the more regular properties of minimal SSA form outweigh the pruned 
form's slight increase in space efficiency. Choi, Cytron, and Ferrante [7] 
give a formal definition and construction algorithm for pruned SSA. 

15 



P <- (X + 2) 
if Po jump 




Yt <- 4 + X c 




Y 2 <- 6 + X 
Zi ^5 



Y 3 ^^(Yt.Yz) 

Z 2 ^(J 3 (Z 0) Z 1 ) 

Y 4 <- Y 3 + 1 

/* no further uses of X or Z */ 



P <- (X ^ 2) 
if Po jump 




Yi ^4 + X c 




Y 3 <-<MYi,Y 2 ) 

Y 4 <- Y 3 + 1 
/* no further uses of X or Z */ 



Figure 4.2: Minimal (left) and pruned (right) SSA forms. 

Figure 4.2 compares minimal and pruned SSA form for our example 
program. 



5 Static Single Information form 

SSI form extends SSA form to achieve symmetry for both forward and 
reverse dataflow. SSI form recognizes that information about variables 
is generated at branches and generates new names at these points. This 
provides us with a one-to-one mapping between variable names and infor- 
mation about the variables at each point in the program. Analyses can then 
associate information with variable names and propagate this information 
efficiently and directly both with and against the control-flow direction. 
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P <- (X + 2) 
if Po jump 




Yt <- 4 + X 




Y 3 <-<MYi,Y 2 ) 

Z 2 <-<i>(Zo,Zl) 

Y 4 <- Y 3 + 1 
/* no further uses of X or Z */ 



P <- (X ± 2) 

if Po jump 

(Xi,X 2 )<-tT(Xo 




X 3 <-<t>{Xi,X 2 

Y 3 ♦-^(Yt.Yz 

Z 2 <-<MZo,Zi 

Y 4 <- Y 3 + 1 

/* no further uses of X or Z */ 




Figure 5.1: A comparison of SSA (left) and SSI (right) forms. 



5.1 Definition of SSI form 

Building SSI form involves adding pseudo-assignments for a variable V: 

(4>) at a control- flow merge when disjoint paths from a conditional branch 
come together and at least one of the paths contains a definition of 
V; and 

(a) at locations where control-flow splits and at least one of the disjoint 
paths from the split uses the value of V. 

Figure 5.1 compares the SSA and SSI forms for the example of Fig- 
ure 4.1. Note that X is renamed at the conditional branch, allowing the 
compiler to distinguish between Xi (which is always the constant 2) from 
X 2 (which is never equal to 2). 

Formally, a program transformation to SSI form satisfies the following 
conditions: 
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1. If two nonnull paths X — > Z and Y — > Z exist having only the node Z 
where they converge in common, and nodes X and Y contain either 
assignments to a variable V in the original program or a 4>- or a- 
function for V in the new program, then a 4>-function for V has been 
inserted at Z in the new program. [Placement of ^-functions.] 

2. If two nonnull paths Z A X and Z ^> Y exist having only the node 
Z where they diverge in common, and nodes X and Y contain either 
uses of a variable V in the original program or a 4>- or cr-function for 

V in the new program, then a cr-function for V has been inserted at 
Z in the new program. [Placement of a-functions.] 

3. For every node X containing a definition of a variable V in the new 
program and node Y containing a use of that variable, there exists 
at least one path X — > Y and no such path contains a definition of V 
other than at X. [Naming after (^-functions.] 

4. For every pair of nodes X and Y containing uses of a variable V defined 
at node Z in the new program, either every path Z — > X must contain 

Y or every path Z A Y must contain X. [Naming after a-functions.] 

5. For the purposes of this definition, the START node is assumed to 
contain a definition and the END node a use for every variable in the 
original program. [Boundary conditions.] 

6. Along any possible control-flow path in a program being executed 
consider any use of a variable V in the original program and the 
corresponding use of Vt in the new program. Then, at every occurance 
of the use on the path, V and Vt have the same value. The path need 
not be cycle-free. [Correctness.] 

18 



As with the SSA conditions, this definition does not prohibit "extra" 
4>- or a- functions not required by conditions 1 and 2. 

Property 5.1. There exists exactly one reaching definition o/V at ev- 
ery non-$ -function use o/V in the new program. 

Proof. Offner [29] defines a reaching definition as follows: 

A definition of a variable v reaches the point P in the program 
iff there is a path from the definition to P on which. . . there is 
no other definition of v 

Prom this definition and condition 3 we directly obtain the property. □ 

Note that condition 3 and this property do not require there to be 
exactly one definition of any variable V, just that at every use only a single 
definition is relevant. The renaming algorithm we will present enforces the 
stricter single-definition constraint. 

Property 5.2. Every cycle-free path S — > Y from the START node to 
a node Y containing a non-$ -function use of a variable must contain 
exactly one node X defining that variable in the new program. Likewise, 
every path X A E from a node X containing a non-o -function definition 
of a variable to the END node must contain every node Y which is a use 
of that variable in the new program. 

Proof. Let us call the variable v. Conditions 5 and 6 ensure that there 
exists at least one definition node X for v from which Y is reachable — 
conditions 5 and 6 substitute the START node, from which every node is 
reachable, for any use of v not reachable by some other definition in the 
original program. So assume this definition node X exists, but is not on 
the path S A Y. Then X A Y and S ^> Y must have some earliest node 
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N in common. But N must then have a 4>-function for v by condition 1, 
which violates either our choice of Y as a non-4>-f unction use (if N = Y) 
or else condition 3 which prohibits definitions other than at X. If S A Y 
contains more than one node Xt defining v, then the path Xo Ay between 
the first and Y also violates condition 3. So S A Y must contain exactly- 
one definition X of v. 

The second part is symmetric. Assume there exists some node Y using 
v which is not contained on some path X A E. The path X A Y must exist 
by conditions 3 and 5. And X ^> E and X ^> Y must have some final node 
N in common, which must have a a- function for v by condition 2. The 
case N = X violates the choice of X as a non- a- function definition. But if 
N ^ X, then condition 3, which prohibits paths with multiple definitions, 
is violated. Thus X A E must contain every use of v. □ 

Property 5.3. Every definition of a variable V dominates all non-$- 
function uses of V and every use of V post-dominates any non-a- 
function reaching definition of V in the new program. 

Proof. The dominance relation is defined in Offner [29] as: 

If x and y are two elements in a flow graph G, then x dominates 
y (x is a dominator of y) iff every path from s [START] to y 
includes x. 

Post-dominance is the dual on a flow graph with edges reversed: x post- 
dominates y iff every path from END to y includes x. 

The previous property showed that every path from START to a non-4>- 
function use contained a unique definition node X. If two paths from START 
to Y contained different definition nodes X^, then Y would be a 4>-function, 
which it was chosen not to be. So every non-4>-function use is dominated 
by the single definition node. Likewise the previous property showed that 
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every path from a non- a- function definition to END must include every use; 
therefore every use post-dominates a non- a- function definition. □ 

5.2 Minimal and pruned SSI forms 

Minimal and pruned SSI forms can be defined which parallel their SSA 
counterparts. Minimal SSI form would have the smallest number of ct>- 
and a-functions such that the above conditions are satisfied. Pruned SSI 
form is the minimal form with any unused 4>- and a-functions deleted; that 
is, it contains no ct>- or a-functions after which there are no subsequent 
non-4>- or a- function uses of any of the variables defined on the left-hand 
side. 7 Figure 5.2 on the next page compares minimal and pruned SSI form 
for our example program. 

Note that, as in SSA form, pruned SSI does not strictly satisfy the SSI 
constraints because it omits dead 4>- and a-functions otherwise required by 
conditions 1 and 2 of the definition. In practice, a subtractive definition 
of pruned form — generate minimal form and then removed the unused 
4>- and a-functions — is most useful, but a constructive definition can be 
generated from the standard SSI form definition as follows: 

1. The convergence/divergence node Z of conditions 1 and 2 must also 
satisfy: "and there exists a path from Z 4 U to a U, a use of V in the 
original program, which does not contain another definition of V." 

2. The boundary condition 5 at END can be loosened as follows (emphasis 
indicates modifications): "For the purposes of this definition, the 
START node is assumed to contain a definition for every variable in 



7 An even more compact SSI form may be produced by removing a-functions for which 
there are uses for exactly one of the variables on the left-hand side, but by doing so one 
loses the ability to perform renaming at control-flow splits which generate additional value 
information. 
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P <- (X + 2) 

if Po jump 

(Xi,X 2 )<-tT(Xo 




X 3 <-<KXi,X 2 
Y 3 ^(Yi,Y 2 

Z 2 <-<|>(Zo,Zi 

Y 4 <- Y 3 + 1 

/* no further uses of X or Z */ 




P <- (X + 2) 

if Po jump 

(Xi,X 2 )<-tT(Xo 




Yt ^4 + X! 




Y 3 <-<MYi,Y 2 ) 

Y 4 <- Y 3 + 1 
/* no further uses of X or Z */ 



Figure 5.2: Minimal (left) and pruned (right) SSI forms. 

the original program and the END nodes a use for every variable live 
at END in the original program." 

Pruned form is defined as having the minimal set of 4>- and cr-functions 
that satisfy the amended conditions. It can easily be verified that the 
modifications suffice to eliminate unused 4>- and a- functions: if the variable 
defined in a 4>- or cr-function is used, there must exist a path Z A II as 
mandated by amendment 1, where amendment 2 lets U = END for variables 
live exiting the procedure and thus usefully defined. 

Property 5.4. A node Z gets a 4>- or a-function for some variable Vt 
in pruned SSI form only if the corresponding variable V is live at Z in 
the original program. 

Proof. This is a trivial restatement of amendment 1. A variable v is said to 
be live at some node N if there exists a node U using v and a path N^li 
on which no definitions of v are to be found. If V is not live at Z then no 
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path Z — > U satisfying the amended conditions 1 and 2 can be found and 
neither a 4>- or cr-function can be placed. Amendment 2 ensures this holds 
true at boundaries. □ 

5.3 Fast construction of SSI form 

The most common construction algorithm for SSA form [11] uses domi- 
nance frontiers and suffers from a possible quadratic blow-up in the size 
of the dominance frontier for certain common programming constructs. 
Various improved algorithms use such things as DJ graphs [38] and the de- 
pendence flow graph [20] to achieve O(EV) time complexity for 4>-function 
placement. We build on this work to achieve O(EV) construction of SSI 
form, and present a new algorithm for variable renaming in SSI form after 
4>- and cr-functions are placed. 

Our construction algorithm begins with a program structure tree of 
single-entry single-exit (SESE) regions, constructed as described by John- 
son, Pearson, and Pingali [19]. We will review the algorithms involved, as 
their published descriptions [18] contain a number of errors. 

We begin with a few definitions from [19]. 

Definition 5.1. Edges a and b are said to be edge cycle-equivalent 
in a graph iff every cycle containing a contains b, and vice-versa. 
Similarly, two nodes are said to be node cycle- equivalent iff every 
cycle containing one of the nodes also contains the other. 

Definition 5.2. A SESE region in a graph G is an ordered edge pair 
(a, b) of distinct control flow edges a and b where 

1. a dominates b, 

2. b postdominates a, and 

3. every cycle containing a also contains b and vice-versa. 
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Edges a and b are called the entry and exit edges, respectively. 
Definition 5.3. A SESE region (a, b) is canonical provided 

1. b dominates V for any SESE region (a, V), and 

2. a postdominates a' for any SESE region (a',b). 

We will give time bounds in terms of N and E, the number of nodes 
and edges of the control-flow graph, respectively. Placement of 4>- and a- 
functions is also dependent on V, the number of variables in the program. 
Since SSI renaming increases the number of variables, we will use Vo and 
Vssi to indicate the number of variables in the original program and SSI 
form, respectively. 

Note that V is O(N) at most, since our representation only allows a 
constant number of variable definitions per node. Typically Vo will be 
much smaller than N, but V SS i need not be. Also E may be as large as 
0(N 2 ), but in most control-flow graphs is O(N) instead, as node arities are 
typically limited by a constant. 

5.3.1 Cycle-equivalency 

The identification of SESE regions begins by computing the cycle-equivalency 
of the edges in the program control flow graph. The cycle-equivalency algo- 
rithm works on undirected graphs, so we prepare the directed control flow 
graph G as follows: 

1. Add an edge from END to START in G. It is common practice to 
add an edge from START to END in order to root the control depen- 
dence graph at START [10]. However, our goal is not rooted control 
dependence but to make the control flow graph into a single strongly 
connected component; for this reason the direction of the edge is from 
END to START instead. 
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Figure 5.3: Transformation from directed to undirected graph (from [18]). 

2. Create an equivalent undirected graph. Johnson et al. prove that 
the node expansion illustrated in Figure 5.3 results in an undirected 
graph with the same cycle-equivalency properties as the original di- 
rected graph. More precisely, nodes a and b in directed graph G are 
cycle-equivalent if and only if nodes a' and V are cycle-equivalent in 
transformed undirected graph G'. The nodes n^ and n generated 
by the expansion are termed not representative; the node n' in G' 
is said to be representative of node n in G. Obviously, this corre- 
spondence must be recorded during the transformation so we may 
properly attribute the cycle-equivalency properties of n' to n later. 

3. Perform a pre-order numbering of nodes in G'. This is done 
with a simple depth- first search of G'. When we visit a node at or 
a , we prefer to visit a' before any other neighbor. This ensures that 
representative nodes are interior nodes in the DFS spanning tree. The 
START node is numbered 0, and succeeding nodes in the traversal get 
increasing numbers. Thus low-numbered nodes are closest to START 
and we will call them "highest" in the DFS spanning tree. 

The above steps form an undirected graph G' from the control- flow 
graph G. The remainder of the cycle-equivalency algorithm is presented 
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Data type BracketList: 

create(): BracketList : Make an empty BracketList structure 
size(bl:BracketList): integer : Number of elements in BracketList structure 
push(bl:BracketList, e:bracket): BracketList : Push e on top of bl 
top(bl:BracketList): bracket : Topmost bracket in bl 
delete(bl:BracketList, e:bracket): BracketList : Delete e from bl 
concat(bll,bl2:BracketList): BracketList : Concatenate bll and bl2 

Operations on nodes: 

Number(n:node): integer : DFS preorder number of node 
NQClass(n:node): integer : Cycle-equivalency class of node 
BList(n:node): BracketList : List of brackets of node 

Hi(n:node): integer : Highest destination node of any edge originating from a 
descendant of node n 

Operations on edges: 

EQClass(e:node): integer : Cycle-equivalency class of edge 
RecentSize(e:edge): integer : Size of bracket set when e was most recently the 

topmost bracket for a representative node 
RecentClass(e:edge): integer : Cycle-equivalency class number of representa- 
tive node for which e was most recently the topmost bracket. 

Figure 5.4: Datatypes and operations for the cycle-equivalency algorithm. 
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Procedure cycle_equiv (G: CFG) 
{ 

/* Preprocessing */ 

G' := Preprocess (G) ; /* described in text */ 

/* Compute CD equivalence classes */ 

for each node n of G', in reverse depth-first order, do { 
/* Compute Hi(n) */ 

/* hiO is highest using backedges only */ 
hiO := min{ Number (t) I (t,n) is a backedge }; 
/* hil is highest through children */ 
hil := min{ Hi(c) I c is a child of n }; 
I /* hi2 is lowest through children */ 
I hi2 := max{ Hi(c) I c is a child of n }; 

Hi(n) := min{ hiO, hil }; 

/* Compute BList(n) */ 
BList(n) := create () ; 

for each child c of n, do 

BList(n) := concat (BList(n), BList(c)); 

for each backedge <d, n> from a descendant d of n to n, do 
BList(n) := delete (BList(n), <d, n>) ; 
I for each capping backedge <d, n> of n, do 
I BList(n) := delete (BList(n), <d, n>) ; 

for each backedge <n, a> from n to an ancestor a of n, do { 

BList(n) := push (BList(n), <n, a>) 

RecentSize(<n, a>) := -1; /* not a representative node */ 
} 

if n has more than one child, then { 

BList(n) := push (BList(n), <n, hi2>) ; /* capping backedge */ 
I RecentSize(<n, hi2>) := -1; 
I add <n, hi2> to capping backedges list of hi2; 

} 

/* Compute NQClass (n) */ 

if n is a representative node, then { 

if RecentSize (top (BList(n))) != size (BList(n)), then { 
/* start a new equivalence class */ 
RecentSize (top (BList(n))) := size (BList(n)); 
RecentClass (top (BList(n))) := new-class-name () ; 
} 

NQClass (n) := RecentClass (top (Blist(n))); 
} 
} /* for each node */ 



27 

Algorithm 5.1: The cycle-equivalency algorithm (corrected from [18]). 




(START,!) = cq (16, END) 

(1,2) = cq (8,16) 

(2,3) = cq (3,4) = cq (7,8) 

(4,5)= cq (5,7) 

(4, 6) = cq (6, 7) 

(1,9)= cq (9,10)= cq (14,15)= cq (15,16) 

(10,11) = cq (11,13) 



Figure 5.5: Control flow graph and cycle-equivalent edges. 



as Algorithm 5.1 on the preceding page, with the above procedure corre- 
sponding to the statement G ; :=Preprocess(G). The algorithm has been 
corrected from the published version in [18]; in addition it has been ex- 
tended to compute both node and edge equivalencies (in effect, merging 
the algorithm of [19]). Lines modified from the presentation in [18] are 
indicated in the figure with a vertical bar in the left margin. The datatype 
BracketList and the node and edge properties used in the algorithm are 
described in Figure 5.4 on page 26. The interested reader is encouraged 
to consult [18] for additional detail on these data structures and represen- 
tations. Figure 5.5 shows cycle-equivalent regions in a simple control-flow 
graph. We use the notation (a,b) = cq (c, d) to indicate that the CFG edge 
from node a to node b is edge cycle-equivalent to the edge from node c to 
node d. 
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Calculating cycle-equivalent regions is based on a single reverse depth- 
first traversal of G, so as long as all datatype operations in Figure 5.4 can be 
completed in constant time (and [18] shows how to do so), this computation 
is 0(E). 

5.3.2 SESE regions and the program structure tree 

Johnson, Pearson, and Pingali show how to construct a tree structure of 
nested SESE regions from the cycle-equivalency information in [19]. The 
cycle-equivalent regions are sorted by dominance using a simple depth- 
first traversal of the graph, and then canonical SESE regions are found by 
taking adjacent pairs of edges from the cycle-equivalence classes. Another 
depth-first search of the CFG suffices to obtain to nesting of these regions, 
which is represented in a data structure called the ■program structure tree. 
The algorithm and data structures required are presented in Figure 5.6 and 
Algorithm 5.2. Figure 5.7 on page 32 shows the SESE regions on the left 
and program structure tree on the right for the example of Figure 5.5 on 
the preceding page. 8 

The time complexity for constructing the PST is easily seen to be 0(E). 
Algorithm 5.2 on page 31 begins with a depth first traversal of G to con- 
struct an ordered edge list for each cycle-equivalent region; the traversal is 
0(E) and the list-append operation can be done in constant time. We then 
iterate through the cycle-equivalence classes and the edge lists of each con- 
structing SESE regions. No edge can be on more than one list, so this step 
is 0(E). Finally, we do a final 0(E) depth- first traversal of G, performing 
the constant-time operations append and LinkRegion. All steps are 0(E) 
and their sequential composition is also 0(E). 



8 In addition, the regions c, d, e and f, g are sequentially composed [19]. However, our 
SSI construction algorithm doesn't use this property. 
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Data type EdgeList: 

size(el:EdgeList): integer : Number of elements in EdgeList structure 
head(el:EdgeList): edge : First edge in el 

tail(el:EdgeList): EdgeList : EdgeList like el but missing first element 
append(el:BracketList, e:edge): EdgeList : Add e to the end of el 

Data type Region: 

NewRegion(el:edge, e2:edge): Region : Creates a new region with entry el 

and exit e2 and no parent 
Entry (r:Region): Edge : The entry edge of r 
Exit(r:Region): Edge : The exit edge of r 
Parent (r:Region): Region : The parent of r, or nil if none 
Nodes(r:Region): NodeList : A list of nodes in r 
LinkRegion(rl,r2:Region): void : Sets the parent of r2 to be rl 

Operations on nodes: 

Mark(n:node): boolean : Visited status during DFS 
SESE(n:node): Region : The canonical SESE of n 

Operations on edges: 

Entry Region(e:edge): Region : the region with entry e, or nil if none exists 
ExitRegion(e:edge): Region : the region with exit e, or nil if none exists 

Figure 5.6: Datatypes and operations used in construction of the PST. 
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NestedSESE(G: CFG) 
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/* initialize */ 

for all nodes n of G do 

Mark(n) <- false 
for all edges e of G do 

Entry Region(e) <— nil 
ExitRegion(e) <— nil 

/* order edges within cycle-equivalency classes by dominance */ 
for each edge e of G in depth first order do 

CQList (EQClass(e)) <- append (CQList(EQClass(e)), e) 
/* get all canonical SESE regions */ 
for all equivalency classes q do 
l^CQList(q) 
while size(l) > 1 do 

r <— NewRegion (head(l),head(tail(l))) 
Entry Region(Entry(r)) <— r 
ExitRegion(Exit(r)) <— r 
/* determine proper nesting of SESE regions */ 
VisitNode(START, top-region) 



VisitNode(n: node, r: Region) = 
1: if Mark(n) = false then 
Mark(n) <— true 

/* record mapping from n to r */ 
SESE(n) <- r 
Nodes(r) <— append(Nodes(r),u) 
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for each edge (n, n') from n to n' do 

ri <— Entry Region( (n,n')) 
r 2 <— ExitRegion((n,rL')) 
if r = ri or r = T2 then 

tn <— Parent(r) /* exiting current region */ 
else 

r N ^r 
if ri / nil and ri / r then 

LinkRegion(riM,ri) /* entering new region */ 

r N <-T! 
if T2 / nil and rj / r then 

LinkRegion(rN,r2) /* entering new region */ 

r N ^r 2 
VisitNode(n',rN) 
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Algorithm 5.2: Computing nested SESE regions and the PST. 





Figure 5.7: SESE regions and PST for the CFG of Figure 5.5 (from [19]). 



5.3.3 Placing 4>- and cr-functions 

As with the presentation of SSA form in [11], we split construction of 
SSI form into two parts: placing 4>- and cr-functions and renaming vari- 
ables. The placement algorithm runs in O(NV ) time, and is presented 
as Algorithm 5.3 on the next page. No new node properties or datatypes 
are required; however, it is parameterized on a function called MaybeLive. 
For minimal SSI form, MaybeLive should always return true. Faster prac- 
tical run-time may be obtained if pruned SSI form is the desired goal by 
allowing MaybeLive to return any conservative approximation of variable 
liveness information, which will allow early suppression of unused 4>- and 
cr-functions. Note that MaybeLive need not be precise; conservative values 
will only result in an excess of 4>- and cr-functions, not an invalid SSI form. 
Section 5.3.6 describes a post-processing algorithm to efficiently remove the 
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Place(G: CFG) 



let r be the top-level region for G 

for each variable v in G do 

PlaceOne(r, v, false) /* place phis */ 
PlaceOne(r, v, true) /* place sigmas */ 



PlaceOne(r: region, v: variable, ps: boolean): boolean 
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/* Post-order traversal */ 
flag <— false 

for each child region r' do 
if PlaceOne(r', v, ps) then 
flag <— true 

for each node n in region r not contained in a child region do 
if ps is false and n contains a definition of v then 

flag <— true 
if ps is true and n contains a use of v then 

flag <— true 

/* add phis /sigmas to merges/splits where v may be live */ 
if flag = true then 

for each node n in region r not contained in a child region do 
if MaybeLive(v, n) = true then 

if ps is false and the input arity of n exceeds 1 then 

place a phi function for v at n 
if ps is true and the output arity of n exceeds 1 then 
place a sigma function for v at n 

return flag 



Algorithm 5.3: Placing 4>- and cr-functions. 
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excess 4>- and cr-functions. 9 The remainder of this section will be devoted 
to a correctness proof of Algorithm 5.3. 

Lemma 5.1. No 4>- functions (o -functions) for a variable v are needed 
in an SESE region not containing a definition (use) of v. 

Proof. Let us assume a 4>-function for v is needed at some node Z inside an 
SESE not containing a definition of v. Then by condition 1 of the SSI form 
definition, there exist paths X ^> Z and Y ^> Z having no nodes but Z in 
common where X and Y contain either definitions of v or 4>- or a- functions 
for v. Choose any such paths: 

Case I: Both X and Y are outside the SESE. Then, as there is only one 
entrance edge into the SESE, the paths X ^> Z and Y ^> Z must 
contain some node in common other than Z. But this contradicts our 
choice of X and Y. 

Case II: At least one of X and Y must be inside the SESE. If both X and 
Y are not definitions of v but rather 4>- or cr-functions for v, then 
by recursive application of this proof there must exist some choice 
of X, Y, and Z inside this SESE where at least one of X and Y is a 
definition. But X or Y cannot be a definition of v because they are 
inside the SESE of Z which was chosen to contain no definitions of v. 

A symmetric argument holds for cr-functions for v, using condition 2 of 
the SSI form definition, and the fact that there exists one exit edge from 
the SESE. □ 



9 Note that equivalent results could be obtained by adding a (^-function for every vari- 
able at every merge and a a-function for every variable at every split, and post-processing. 
In fact the same time bounds (O(NVo)) would be obtained. There is a large practical dif- 
ference in actual runtime and space costs, however, which motivates our more efficient 
approach. 
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The above lemma justifies line 14 of the algorithm on page 33, which 
skips over any SESE region not containing a definition (use) of v when 
placing 4>-functions (a- functions) for v. 

Lemma 5.2. // a definition (use) or a ct>- or a-function for a variable 
v is present at some node D (\1), then a ^-function (a-function) for 
v is needed at every node N: 

1. of input (output) arity greater than 1, 

2. reachable from D (from which U is reachable), 

3. whose smallest enclosing SESE contains D (\1), and 

4- which is not dominated by D (not post- dominated by M). 

Proof. We will first prove that a node N failing any one of the conditions 
does not need a 4>- or a-function. 

• Conditions 1 and 2 of the SSI form definition require node N to be 
the first convergence (divergence) of some paths X ^> N and Y ^> N 
(N A X and N -^ Y). If the input arity is less than 2 or there is no 
path from a definition of v, than it fails the ct>-placement criterion 1. 
If the output arity is less than 2 or there is no path to a use of v, then 
it fails the cr-placement criterion 2. 

• If there exists a SESE containing N that does not contain any def- 
inition, 4>- or a-function D for v, then N does not require a cj> or 
a-function for v by lemma 5.1. 

• Let us suppose every D t containing a definition, ct>- or a-function 
for v dominates N. If N requires a cj>function for v, there exist 
paths Di -^ N and D 2 -^ N containing no nodes in common but 
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N. We use these paths to construct simple paths START — > D, — > N 
and START A D 2 A N. By the definition of a dominator, every 
path from START to N must contain every D t . But D| ^N cannot 
contain D 2 , and if START — > D^ contains D 2 , we can make a path 
START -±> D 2 -±> N which does not contain D] by using the D^free 
path D 2 A N. The assumption leads to a contradiction; thus, there 
must exist some D t which does not dominate N if N is required to 
have a 4>-function for v. The symmetric argument holds for post- 
dominance and a-functions. 

This proves that the conditions are necessary. It is obvious from an exami- 
nation of conditions 1 and 2 of the SSI form definition and lemma 5.1 that 
they are sufficient. □ 

In practice, the conditions of lemma 5.2 are too expensive to implement 
directly. Instead, we use a conservative approximation to SSI form, which 
allows us to place more 4>- and a-functions than minimal SSI requires (for 
example, a cj>function for v at the circled node in Figure 5.8), while satis- 
fying the conditions of the SSI form definition. Our algorithm also allows 
us to do pre-pruning of the SSI form during placement. The result is not 
pruned SSI, but contains a tight superset of the 4>- and a-functions that 
pruned form requires. 

Theorem 5.1. Algorithm 5.3 places all the 4>- and a-functions required 
by conditions 1 and 2 of the SSI form definition. 

Proof. Lemma 5.1 states that the child region exclusion of Algorithm 5.3 
does not cause required 4>- or a-functions to be omitted. Property 5.4 
allows the omission of 4>- and a-functions for v at nodes where v is dead 
when creating pruned form; MaybeLive may not return false for nodes 
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Figure 5.8: An flowgraph where Algorithm 5.3 places ct>-functions conser- 
vatively. 



where v is not dead, but may return true at nodes where v is dead without 
harming the correctness of the 4>- and cr-function placement. □ 



5.3.4 Computing liveness 

Incorporating liveness information into the creation of pruned SSI form 
appears to lead to a chicken-and-egg problem: although the pruned SSI 
framework allows highly efficient liveness analysis, obtaining the liveness 
information from the original program can be problematic. The fastest 
sparse algorithm has stated time bounds of 0(E + N 2 ) [7], which is likely 
to be more expensive than the rest of the SSI form conversion. Luckily, 
Kam and Ullman [21], in conjunction with an empirical study by Knuth 
[23], show that liveness analysis is highly likely to be linear for reducible 
flow-graphs. In our work this question is avoided, as we obtain our liveness 
information directly from properties of the Java bytecode files that are our 
input to the compiler. But in any case our algorithms allow conservative 
approximation to liveness, so even in the case of non-reducible flow graphs 
it should not be difficult to quickly generate a rough approximation. 

37 



Rename(G: CFG) 



Init(G) 

for each edge e leaving START do 
Search (el 



Init(G: CFG) 



for each edge e in G do 

Marked [e] <- false 
for each variable V in G do 

C(V) <- 
£ = created /* create a new environment */ 



Inc(£: Environment, V: variable): variable 
1: i<-C(V) + 1 
2: C(v) <- i 
3: 5.put(V,Vi) 
4: return Vt 



Algorithm 5.4: SSI renaming algorithm. 

5.3.5 Variable renaming 

Algorithm 5.4 performs variable renaming on a flow-graph with placed 
4>- and cr-functions in a single depth-first traversal. When the algorithm is 
complete, the control flow-graph will be in proper SSI form. The variable 
renaming algorithm requires an Environment datatype which is defined in 
Figure 5.9. Using an imperative programming style, it is possible to per- 
form a sequence of any N operations on Environment as defined in the figure 
in O(N) time; in a functional programming style any N operations can be 
completed in 0(N logN) time. 10 As the coarse structure of Algorithm 5.4 
is a simple depth- first search, it is easy to see that the Search procedure 
can be invoked from line 3 on page 38 and line 32 on page 39 a total of 



10 The curious reader is referred to section 5.1 of Appel [4] for implementation details. 
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Searches, d): edge) = 

Require: s to be a node containing d> or cr-functions, or START 

Require: Marked[(s, d}] = false 
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Marked[(s, d)] <— true 
beginScope(£) 

if s is a node containing (^-functions then 
for each ^-function P in s do 

replace the destination V of P by Inc(£, V) 
else if s is a node containing cr-functions then 
for each cr-function S in s do 
j <- WhichSucc((s,d}) 

replace the j-th destination V of S by Inc(£ , V) 
loop /* now rename inside basic block */ 
if d is a node containing d>functions then 
for each cb-function P in d do 
j <-WhichPred((s,d}) 
replace the j-th operand V of P by get(£, V) 
break /* end of basic block */ 
else if s is a node containing cr-functions then 
for each a-function S in d do 

replace the operand V of S by get(£, V) 
break /* end of basic block */ 
I* ordinary assignment, at most one successor */ 
for each variable V in RHS(d) do 
replace V by get (5, V) in RHS(d) 
for each variable V in LHS(d) do 
replace V by Inc(£, V) in LHS(d) 
if d has no successor then 

break /* end of basic block */ 
s <— d 

d <— successor of d 
end loop 

for each successor n of d do 
if not Marked [(d,n)] then 

Searched, n}) /* dfs recursion */ 
endScope(£) 
return 



Algorithm 5.5: SSI renaming algorithm, cont. 
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0(E) times; likewise its inner loop (lines 10 to 29) can be executed a total 
of E times across all invocations of Search. A total of U SSA + D SSA calls to 
the operations of the Environment datatype will be made within all execu- 
tions of Search. For the imperative implementation of Environment a total 
time bounds of 0(E + Ussa + E)ssa) for the variable renaming algorithm is 
obtained. 

We have shown that Algorithm 5.3 places all the required 4>- and a- 
functions in the control-flow graph according to SSI form conditions 1, 2, 
and 5; we will now show that this algorithm renames variables consistent 
with conditions 3 and 4 to prove that these algorithms combined suffice to 
convert a program into SSI form. The SSI form is not necessarily minimal, 
as we showed in section 5.3.3; the next section will show how to post-process 
to create minimal or pruned SSI form. 

Lemma 5.3. The stack trace of calls to Search defines a unique path 
through G from START. 

Proof. We will prove this lemma by construction. For every consecutive 
pair of calls to Search we construct a path X ^> Y starting with the edge 
(X, N ) which is the argument of the first call, and ending with the edge 
(N n , Y) which is the argument of the second call. From line 28 of the Search 
procedure on page 39 we note that every edge (Ni,N i+ i) between the first 
and last has exactly one successor. Furthermore, the call to search on line 32 
defines a path starting with the edge which our segment X -±> Y ends with; 
therefore the paths can be combined. By so doing from the bottom of the 
call stack to the top we construct a unique path from START. □ 

For brevity, we will hereafter refer to the canonical path constructed 
in the manner of lemma 5.3 corresponding to the stack of calls to Search 
when an edge e is first encountered as CP(e). Every edge in the CFG is 
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encountered exactly once by Search, so CP(e) exists and is unique for every 
edge e in the CFG. 

Lemma 5.4. SSI form condition 3 (^-function naming) holds for vari- 
ables renamed according to Algorithm 5.4- 

Proof. We restate SSI form condition 3 for reference: 

For every node X containing a definition of a variable V in the 
new program and node Y containing a use of that variable, there 
exists at least one path X A Y and no such path contains a 
definition of V other than at X. 

We consider the canonical path CP((Y',Y)) = START AY'^Y for some 
use of a variable v at Y, constructed according to lemma 5.3 from a stack 
trace of calls to Search, is encountered. This path is unique, although more 
than one canonical path may terminate at Y at nodes with more than one 
predecessor. These paths are distinguished by the incoming edge to Y. 11 
We identify each operand Vt of a ct>-function with the appropriate incoming 
edge e to ensure that CP(e) is well defined and unique in the context of a 
use of v-l. 

The canonical path START — > Y must contain X, a definition of v, if Y 
uses a variable defined in X, as Search renames all definitions (in lines 5, 
9, and 24) and destroys the name mapping in 8 just before it returns. The 
call to Search which creates the definition of v must therefore always be 
on the stack, and thus in the path CP((Y', Y)), for any use to receive a the 



n Note that the notation (N,N') for denoting edges does not always denote an edge 
unambigiously; imagine a conditional branch where both the true and false case lead 
to the same label. In such cases an additional identifier is necessary to distinguish the 
edges. Alternatively, one may split such edges to remove the ambiguity. We treat edges 
as uniquely identifiable and leave the implementation to the reader. 
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name v. Note that this is true for 4>-functions as well, which receive names 
when the appropriate incoming edge (Y',Y) is traversed, not necessarily 
when the node Y containing the 4>-function is first encountered. 

We have proved that START -±> X -^ Y exists; now we must prove that no 
other path from X to Y contains a definition of v. Call this other definition 
D. Obviously D cannot be on our canonical path START -±> X -±> Y, or 
line 24 would have caused Y to use a different name. But as we just stated, 
all variable name mappings done by D will be removed when the call to 
Search which touched D is taken off the call stack. So D must be on the 
call stack, and thus on the canonical path; a contradiction. Since assuming 
the existence of some other path X A Y containing a definition of v leads 
to contradiction no other such path may exist, completing the proof of the 
lemma. □ 

Lemma 5.5. SSI form condition 4 (o-function naming) holds for vari- 
ables renamed according to Algorithm 5.4.. 

Proof. We restate SSI form condition 4 for reference: 

For every pair of nodes X and Y containing uses of a variable V 
defined at node Z in the new program, either every path Z A X 
must contain Y or every path Z -^> Y must contain X. 

Let us assume there are paths Z A X and Z A Y violating this condition; 
that is, let us chose nodes X and Y which use V and Z defining V such that 
there exists a path Pi from Z to X not containing Y and a path P 2 from Z to 
Y not containing X. By the argument of the previous lemma, there exists 
a canonical path P 3 = CP(e) from START to X through Z corresponding to 
a stack trace of Search; note that P 3 need not contain P^ There are two 
cases: 

42 



Case I: P3 does not contains Y. Then there is some last node N present 
on both P 2 : Z A N -4 Y and P 3 : START ^ZAN^X. By SSI 
condition 2 this node N requires a cr-function for V. If N 7^ Z then 
line 5 of Algorithm 5.4 would rename V along P 3 and X would not 
use the same variable Z defined; if N = Z, then line 9 would have 
ensured that X and Y used different names. Either case contradicts 
our choices of X, Y, and Z. 

Case II: P 3 does contain Y. Then consider the path START A Z A Y along 
P 3 , which does not contain X. The argument of case I applies with X 
and Y reversed. 

Any assumed violation of condition 4 leads to contradiction, proving the 
lemma. □ 

Every path CP(e) corresponds to a execution state in a call to Search 
at the point where e is first encountered. The value of the environment 
mapping £ at this point in the execution of Algorithm 5.4 we will denote 
as £ e . For a node N having a single predecessor N p and single successor 
N s , we will denote £< N ?> N > as ££ fore and £< N > N *> as £^ er . It is obvious that 
£*L = before and ^Ifter = before wnen N p and N s , respectively, are also 
single-predecessor single-successor nodes. 

Lemma 5.6. SSI form condition 6 (correctness) holds for variables re- 
named according to Algorithm 5.4- That is, along any possible control- 
flow path in a program being executed a use of a variable Vt in the new 
program will always have the same value as a use of the corresponding 
variable V in the original program. 

Proof. We will use induction along the path N — > Ni —>...—> N n . We 
consider e^ = (Nk,N k+ i), the (k+ 1)th edge in the path, and assume that, 
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for all j < k, each variable V in the original program agrees with the value 
of £ e i [V] = Vt in the new program. We show that £ £]C [V\ agrees with V at 
edge e k in the path. 

Case I: k = 0. The base case is trivial: the START node (N ) contains 
no statements, and along each edge e leaving start £ e [V] = V . By- 
definition Vo agrees with V at the entry to the procedure. 

Case II: k > and N k has exactly one predecessor and one successor. 
If N k is single-entry single-exit, then it is not a 4>- or a- function. 
As an ordinary assignment, it will be handled by lines 20 to 24 of 
Algorithm 5.5 on page 39. By the induction hypothesis (which tells 
us that the uses at N k correspond to the same values as the uses in the 
original program) and the semantics of assignment, the mapping £^ t k r 
is easily verified to be valid when £^ oie is valid. Thus the value of 
every original variable V corresponds to the value of the new variable 
£ a N ft k er [V]=£ e 4V]one k . 

Case III: k > and N k has multiple predecessors and one successor. In 
this case N k may have multiple 4>-functions in the new program, and 
by the definition in section 3 N k has no statements in the original 
program. Thus the value of any variable V in the original program 
along edge e k is identical to its value along edge e k _i . We need only 
show that the value of the variable £ ek1 [V] is the same as the value 
of the variable £ £]C [V] in the new program. For any variable V not 
mentioned in a 4>-function at N k this is obvious. Each variable defined 
in a 4>-function will get the value of the operand corresponding to the 
incoming control- flow path edge. The relevant lines in Algorithm 5.5 
start with 13 and 14, where we see that the operand corresponding to 
edge e k _i of a 4>-function for V correctly gets £ ek1 [V]. At line 5, we 
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see that the destination of the (^-function is correctly £ ek [V]. Thus 
the value of every original variable V correctly correponds to £ e]c [V\ 
by the induction hyptothesis and the semantics of the 4>-functions. 

Case IV: k > and N k has one predecessor and multiple successors. Here 
N k may have multiple a- functions in the new program, and is empty 
in the original program. The argument goes as for the previous 
case. It is obvious that variables not mentioned in the a- functions 
correspond at e k if they did at e k _i. For variables mentioned in 
a-functions, line 18 shows that operands correctly get £ ek1 [V] and 
line 9 shows that the destination corresponding to e^ correctly gets 
£ ek [V]. Therefore the values of original variables V correspond to the 
value of £ ek [V] by the induction hypothesis and the semantics of the 
a-functions. 

Case V: N k has multiple predecessors and multiple successors. Forbidden 
by the CFG definition in section 3. 

Therefore, on every edge of the chosen path, the values of the original vari- 
ables correspond to the values of the renamed SSI form variables. The value 
correspondence at the path endpoint (a use of some variable V) follows. □ 

Theorem 5.2. Algorithm 5.4 renames variables such that SSI form 
conditions 3, 4, and 6 hold. 

Proof. Direct from lemmas 5.4, 5.5, and 5.6. □ 

Theorem 5.3. Algorithms 5.3 and 5.4 correctly transform a program 
into SSI form. 

Proof. Theorem 5.1 proves that 4>- and a-functions are placed correctly to 
satisfy conditions 1, 2 and 5 of the SSI form definition, and theorem 5.2 
proves that variables are renamed correctly to satisfy conditions 3, 4 and 6. 

□ 
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5.3.6 Pruning SSI form 

The SSI algorithm can be run using any conservative approximation to the 
liveness information (including the function MaybeLive(v,n) = true) if 
unused code elimination 12 is performed to remove extra 4>- and cr-functions 
added and create pruned SSI. Figure 5.10 and Algorithm 5.6 present an 
algorithm to identify unused code in 0(NV S si) time, after which a simple 
O(N) pass suffices to remove it. The complexity analysis is simple: nodes 
and variables are visited at most once, raising their value in the analysis 
lattive from unused to used. Nodes marked used are never visted. So 
MarkNodeUsef ul is invoked at most N times, and MarkVarUsef ul is invoked 
at most Vssi times. The calls to MarkNodeUseful may examine at most 
every variable use in the program in lines 3-5, taking 0(U S si) time at 
worst. Each call to MarkVarUseful examines at most one node (the single 
definition node for the variable, if it exists) and in constant time pushes at 
most one node on to the worklist for a total of O(Vssi) time. So the total 
run time of FindUseful is 0(U S si + V SS i) = 0(U S si)- 

5.3.7 Discussion 

Note that our algorithm for placing 4>- and cr-functions in SSI form is 
pessimistic; that is, we at first assume every node in the control-flow graph 
with input arity larger than one requires a 4>-function for every variable 
and every node with out-arity larger than one requires a cr-function for 
every variable, and then use the PST, liveness information, and unused 
code elimination to determine safe places to omit 4>- or cr-functions. Most 



12 We follow [44] in distinguishing unreachable code elimination, which removes code 
that can never be executed, from unused code elimination, which deletes sections of 
code whose results are never used. Both are often called "dead code elimination" in the 
literature. 

46 



Data type Environment: 

createQ: Environment : 

make an environment with no mappings. 
put(£: Environment, vi: variable, V2- variable) : 

extend environment £ with a mapping from vi to vj- 
get(£: Environment, v: variable): variable : 

return the current mapping in £ for v. 

beginScope(<S: Environment) : 

save the current mapping of £ for later restoration. 
endScope(£: Environment) : 

restore the mapping of £ to that present at the last beginScope on £. 

Figure 5.9: Environment datatype for the SSI renaming algorithm. 



Operations on nodes: 

NodeUseful(n:node): boolean : Whether the results of this node are ever used 
Uses(n:node): set of variables : Variables for which this node contains a use 

Operations on variables: 

VarUseful(v: variable): boolean : Whether there is some n for which Uses (n) 

contains v and NodeUseful(n) is true 
Definitions(v:variable): set of nodes : Nodes which contain a definition for v 

Figure 5.10: Datatypes and operations used in unused code elimination. 
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FindUseful(G: CFG) = 

1: let W be an empty work list 

2: for each variable v in G do 

3: VarUseful(v) <- false 

4: for each node n in G in any order do 

5: NodeUseful(n) <— false 

6: if u is a CALL, RETURN, or other node with side-effects then 

7: add n to W 



9 
10 
11 
12 



while W is not empty do 
let n be any element from W 
remove n from W 
MarkNodeUseful(u, W) 



MarkNodeUseful(n: node, W: WorkList) = 
1: NodeUseful(n) <— true 

2: /* everything used by a useful node is useful */ 
3: for each variable v in Uses(n) do 
4: if not VarUseful(v) then 
5: MarkVarUseful(v,W) 

MarkVarUseful(v: variable, W: WorkList) = 
VarUseful(v) <— true 

/* The definition of a useful variable is useful */ 
for each node n in Definitions (v) do 

I* In SSI form, size{Definitions{v)) < 1 */ 
if not NodeUseful(n) then 
add n to W 



Algorithm 5.6: Identifying unused code using SSI form. 
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Figure 5.11: A worst-case CFG for "optimistic" algorithms. 

SSA construction algorithms, by contrast, are optimistic; they assume no 
4>- or a-functions are needed and attempt to determine where they are 
provably necessary. In my experience, optimistic algorithms tend to have 
poor time bounds because of the possibility of input graphs like the one 
illustrated in Figure 5.11. Proving that all but two nodes require 4>- and/or 
a-functions for the variable a in this example seems to inherently require 
O(N) passes over the graph; each pass can prove that 4>- or a-functions are 
required for only those nodes adjacent to nodes tagged in the previous pass. 
Starting with the circled node, the 4>- and a-functions spread one node left 
on each pass. On the other hand, an pessimistic algorithm assumes the 
correct answer at the start, fails to show that any 4>- or a-functions can be 
removed, and terminates in one pass. 

5.4 Time and space complexity of SSI form 

Discussions of time and space complexity for sparse evaluation frameworks 
in the literature are often misleadingly called "linear" regardless of what 
the O-notation runtime bounds are. A canonical example is [38], which 
states that for SSA form, "the number of 4>-nodes needed remains linear." 
Typically Cytron [11] is cited; however, that reference actually reads: 
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Linearity of uses in SSI form 
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Figure 5.12: Number of uses in SSI form as a function of procedure length. 
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Figure 5.13: Number of original variables as a function of procedure length. 
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For the programs we tested, the plot in [Figure 21 of Cytron's 
paper] shows that the number of 4>-functions is also linear in 
the size of the original program. 

It is important to note that Cytron's claim is based not on algorithmic 
worst-bounds complexity, but on empirical evidence. This reasoning is not 
unjustified; Knuth [23] showed in 1974 that "human-generated" programs 
almost without exception show properties favorable to analysis; in particu- 
lar shallow maximum loop nesting depth. Wegman and Zadeck [44] clearly 
make this distinction by noting that: 

In theory the size [of the SSA form representation] can be O(EV), 
but empirical evidence indicates that the work required to com- 
pute the SSA graph is linear in the program size. 

Our worst-case space complexity bounds for SSI form are identical to SSA 
form — O(EV) — but in this section we will endeavour to show that typical 
complexities are likewise "linear in the program size." 

The total runtime for SSI placement and subsequent pruning, including 
the time to construct the PST, is 0(E + NV + U S si)- For most programs 
E will be a small constant factor multiple of N; as Wegman and Zadeck 
[44] note, most control flow graph nodes will have at most two successors. 
For those graphs where E is not O(N), it can be argued that E is the more 
relevant measure of program complexity. 13 

Thus the "linearity" of our SSI construction algorithm rests on the 
quantities NV and U S si- Figures 5.12 and 5.13 present empirical data 
for V and U S si on a sample of 1,048 Java methods. The methods varied 
in length from 4 to 6,642 statements and were taken from the dynamic 



13 We will not follow Cytron [11] in defining a new variable R to denote max(N, E, . . .) to 
avoid following him in declaring worst-case complexity 0(R 3 ) and leaving it to the reader 
to puzzle out whether 0(N 6 ) (!) is really being implied. 
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call-graph of the FLEX compiler itself, which includes large portions of 
the standard Java class libraries. Figure 5.12 shows convincingly that U S si 
grows as N for large procedures, and Figure 5.13 supports an argument 
that Vo grows very slowly and that the quantity NVo would tend to grow 
as N 13 . This would argue for a near-linear practical run-time. 

In contrast, Cytron's original algorithm for SSA form had theoretical 
complexity 0(E + V S sa|DF| + NV SSA ). Cytron does not present empirical 
data for Vssa, but one can infer from the data he presents for "number of 
introduced 4>-functions" that Vssa behaves similarly to Vssi — that is, it 
grows as N, not as V . It is frequently pointed out 14 that the |DF| term, 
the size of the dominance frontier, can be 0(N 2 ) for common programming 
constructs (repeat-until loops), which indicates that the Vssa|DF| term 
in Cytron's algorithm will be 0(N 2 ) at best and at times as bad as 0(N 3 ). 

Note that the space complexity of SSI form, which may be O(EV) in the 
worst case (4>- and cr-functions for every variable inserted at every node) is 
certainly not greater than Ussi, and thus Figure 5.12 shows linear practical 
space use. 

6 Uses and applications of SSI 

The principle benefits of using SSI form are the ability to do predicated 
and backward dataflow analyses efficiently. Predicated analysis means 
that we can use information extracted from branch conditions and control 
flow. The cr-functions in SSI form provide an variable naming that allows 
us to sparsely associate the predication information with variable names 
at control flow splits. The a-functions also provide a reverse symmetry to 
SSI form that allow efficient backward dataflow analyses like liveness and 



J See Dhamdhere [12] for example. 
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anticipatability. 

In this section, we will briefly sketch how SSI form can be applied 
to backwards dataflow analyses, including anticipatability, an important 
component of partial redundancy elimination. We will then describe in de- 
tail our Sparse Predicated Typed Constant propagation algorithm, which 
shows how the predication information of SSI form may be used to advan- 
tage in practical applications, including the removal of array bounds and 
null-pointer checks. Lastly, we will describe an extension to SPTC that 
allows bitwidth analysis, and the possible uses of this information. 

6.1 Backward Dataflow Analysis 

Backward dataflow analyses are those in which information is propa- 
gated in the direction opposite that of program execution [29]. There is 
general agreement [20, 7, 45] that SSA form is unable to directly handle 
backwards dataflow analyses; liveness is often cited as a canonical exam- 
ple. 

However, SSI form allows the sparse computation of such backwards 
properties. Liveness, for example, comes "for free" from pruned SSI form: 
every variable is live in the region between its use and sole definition. Prop- 
erty 5.2 states that every non-4>-function use of a variable is dominated by 
the definition; Cytron [11] has shown that 4>-functions will always be found 
on the dominance frontier. Thus the live region between definition and use 
can be enumerated with a simple depth-first search, taking advantage of 
the topological sorting by dominance that DFS provides [29]. Because of 
cj>function uses, the DFS will have to look one node past its spanning- 
tree leaves to see the 4>-functions on the dominance frontier; this does not 
change the algorithmic complexity. 

Computation of other dataflow properties will use this same enumera- 
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tion routine to propagate values computed on the sparse SSI graph to the 
intermediate nodes on the control-flow graph. Formally, we can say that 
the dataflow property for variable v at node N is dependent only on the 
properties at nodes D and U, defining and using v, for which there is a path 
d4u containing N . There is a "default" property which holds for nodes 
on no such path from a definition to use; for liveness the default property is 
"not live." The remainder of this section will concentrate on the dataflow 
properties at use and definition points. 

A slightly more complicated backward dataflow property is very busy 
expressions] this analysis is somewhat obsolete as it serves to save code 
space, not time. This in turn is related to partial and total anticipatabil- 
ity. 

Definition 6.1. An expression e is very busy at a point P of the pro- 
gram iff it is always subsequently used before it is killed [29]. 

Definition 6.2. An expression e is totally (partially) anticipatable at 

a point P if, on every (some) path in the CFG from P to END, there is 
a computation of e before an assignment to any of the variables in e 

[20]. 

Johnson and Pingali [20] show how to reduce these properties of ex- 
pressions to properties on variables. We will therefore consider properties 
BSY(v,N), ANT(v,N), and PAN(v,N) denoting very busy, totally antici- 
patable, and partially anticipatable variables v at some program point N. 

To compute BSY, we start with pruned SSI form. Any variable defined 
in a 4>- or cr-function is used at some point, by definition. So for statements 
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at a point P we have the rules: 

v=... BSY in (v,P) = false 

...=v BSY in (v,P) =true 

x = 4>(y , • • • .Un) BSY in ( yi , P) = BSY out (x, P) 

(x , ..,x n ) = a[y) BSY in (-y,P) = A£ =0 BSY out (x i ,P) 

Total anticipatability, in the single variable case, is identical to BSY. 
Partial anticipatability for a variable v at point P follows the rules: 

v = ... PAN in (v,P) = false 

. . . = v PAN in (v, P) = true 

x = cM-yo, ■■■,y n ) PANindji, P) = PAN out (x, P) 

(x , . . . ,x n ) = ofy) PAN in (-y,P) = VIUPAN out (x t ,P) 

The present section is concerned more with feasibility than the mechan- 
ics of implementation; we refer the interested reader to [29] and [20] for 
details on how to turn the efficient computation of BSY, PAN and ANT 
into practical code-hoisting and partial-redundancy elimination routines, 
respectively. 

We note in passing that the sophisticated strength-reduction and code- 
motion techniques of SSAPRE [22] are applicable to an SSI-based represen- 
tation, as well, and may benefit from the predication information available 
in SSI. The remainder of this section will focus on practical implementa- 
tions of predicated analyses using SSI form. 

6.2 Sparse Predicated Typed Constant Propagation 

Sparse Predicated Typed Constant (SPTC) Propagation is a powerful anal- 
ysis tool which derives its efficiency from SSI form. It is built on Wegman 
and Zadeck's Sparse Conditional Constant (SCC) algorithm [44] and re- 
moves unnecessary array-bounds and null-pointer checks, computes vari- 
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Executable 



Not Executable 



Figure 6.1: Three-level value lattice and two-level executability lattice for 
SCC. 
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Table 6.1: Meet and binary operation rules on the SCC value lattice. 

able types, and performs floating-point- and string-constant-propagation 
in addition to the integer constant propagation of standard SCC. 

We will describe this algorithm incrementally, beginning with the stan- 
dard SCC constant-propagation algorithm for review. Wegman and Zadeck's 
algorithm operates on a program in SSA form; we will call this SCC/SSA 
to differentiate it from SCC/SSI, using the SSI form, which we will describe 
in section 6.2.2. Section 6.3 on page 72 will discuss an extension to SPTC 
which does bit-width analysis. 

6.2.1 Wegman and Zadeck's SCC/SSA algorithm 

The SCC algorithm works on a simple three-level value lattice asso- 
ciated with variable definition points and a two-level executability lattice 
associated with flow-graph edges. These lattices are shown in Figure 6.1. 
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Init(G:CFG) 



E e ^0 
E n ^0 

for each variable v in G do 
if some node n defines v then 

V[v] <- J_ 
else 
V[v] <— T /* Procedure arguments, etc. */ 



Analyze(G:CFG) = 
1: let r be the start node of graph G 

E n <- E n U {r} 
W n <- M 
W v ^0 



2 
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repeat 

if W n is not empty then 

remove some node n from W n 

if n has only one outgoing edge e and e ^ E c then 

RaiseE(e) 
Visit (n) 
if W v is not empty then 

remove some variable v from W v 
for each node n containing a use of v do 
Visit (n) 
until both W v and W u are empty 



Algorithm 6.1: SCC algorithm for SSA form. 
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RaiseE(e:edge) = 
1: /* When called, e £ E e */ 
2: E e <- E c U {e} 

3: let n be the destination of edge e 
4: if n £ E n then 
5: E n <- E n U {n} 
6: W n <- W n U {n} 

RaiseV(v:variable, E:lattice value) = 



if V[v] C E then 
V[v] «- E 
Wv<-W v U{v} 



Visit(n:node) = 

for each assignment "v <— x © tj" inn do 

RaiseV(v, V[x] © V[y]) /* binop rule: see table 6.1 */ 
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for each assignment "v <- MEM(. . .)" or "v <- CALL(. . .)" in n do 
RaiseV(v, T) 

for each assignment "v <— 4>(xi , . . . ,x n )" in n do 

for each variable xt corresponding to predecessor edge e\ of n do 
if e-t 6 E e then 

RaiseV(v, V[v] n V[x t ]) /* meet rule: see table 6.1 */ 

for each branch "if v goto e-\ else Z2 in n do 

E k- V[v] 

if E = T or E = c where c signifies "true" and e\ <£ E e then 

RaiseE(ei) 
if E = T or E = c where c signifies "false" and ^^ £ E e then 

RaiseE(e 2 ) 



Algorithm 6.2: SCC algorithm for SSA form, cont. 
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Associating a lattice value with a definition point is a conservative state- 
ment that, for all possible program paths, the value of that variable has a 
certain property. The value lattice is, formally, IntJ_; the lattice value ± 
signifies that no information about the value is known, the lattice value T 
indicates that it is possible that the variable has more than one dynamic 
value, and the other lattice entries (corresponding to integer constants and 
occuping a flat space between T and _L) indicate that the variable can 
be proven to have a single constant value in all runs of the program. 15 
Similarly, the executability lattice indicates whether it is possible that the 
control flow edge is traversed in some execution of the program (marked 
"executable"), or if it can be proven that the edge is never traversed in any 
valid program path (marked "not executable"). The algorithm works with 
SSA form, and is presented as Algorithm 6.1. Binary operations on lattice 
values and combination at 4>-nodes follow the rules in Table 6.1; notice that 
the meet operation (n) is simply the least upper bound on the lattice. The 
time complexity of SCC/SSA can be found easily: the procedure RaiseE 
puts each node on the W n worklist at most once, and RaiseV puts a variable 
on the W v worklist at most D — 1 times, where D is the maximum lattice 
depth. The Visit procedure can thus be invoked a maximum of N times 
by line 11 of the Analyze procedure of Algorithm 6.1, and a maximum of 
Ussa(D — 1 ) times by line 15, where Ussa is the number of variable uses in 
the SSA representation of the program. The lattice depth D is the constant 
3 in this version of the algorithm, so it drops out of the expression. The 
RaiseE procedure itself is called at most E times. The time complexity is 



15 Note that we follow the T and _L conventions used in semantics and abstract interpre- 
tation; authors in dataflow analysis (including Wegman and Zadeck in their SCC paper 
[44]) often use contrary definitions, letting T mean undefined and _L indicate overdefini- 
tion. As section 7.3 will discuss the semantics of SSI + at length, we thought it best to 
adhere to one set of definitions consistently, instead of switching mid-paper. 
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foo = fO; 
if (foo == 1) 

bar = foo + 1; 
else 

bar = 2; 



foo = f ; 
if (foo == 1) 
(fooi,foo 2 ) = cr(fooo) 

bar = foo 2 + 1; 
else 

bar! = 2; 
bar 2 = (^(barcbari) 



Figure 6.2: A simple constant-propagation example. 

thus 0(E + N + U SS a(D - 1)) which simplifies to 0(E + U SSA ). 

6.2.2 SCC/SSI: predication using a-functions. 

Porting the SCC algorithm from SSA to SSI form immediately increases 
the number of constants we can find. A simple example is shown in 
Figure 6.2: the version of the program on the right is in SSI form, and 
SCC/SSI — unlike SCC/SSA — can determine that f oo 2 is a constant with 
value 1 (although nothing can be said about the value of f oo or f ooi) and 
therefore that bar , bar!, and bar 2 are constants with the value 2. SSI 
form creates a new name for bar at the conditional branch to indicate that 
more information about its value is known. 

Only the Visit procedure must be updated for SCC/SSI: lattice update 
rules for a-functions must be added. Algorithm 6.3 shows a new Visit 
procedure for the two-level integer constant lattice of Wegman and Zadeck's 
SCC/SSA; with this restricted value set only integer equality tests tap the 
algorithm's full power. The utility of SCC/SSI's predicated analysis will 
become more evident as the value lattice is extended to cover more constant 
types. 

The time complexity of the updated algorithm is identical to that of 
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Visit (runode) = 

1: /* Assignment rules as on page 58 */ 

2: 

3: for each branch "if x = y goto ei else e 2 " in n do 

4: if L[x] = T or L[y] = T then 

5: RaiseE(ei) 

6: RaiseE(e 2 ) 

7: else if L[x] = c and L[y] = d then 

8: if c = d then 

9: RaiseE(ei) 

10: else 

11: RaiseE(e 2 ) 

12: for each assignment "(vi,v 2 ) <— cr(vo)" associated with this branch do 
13: if edge e\ 6 E e and variable vo is the x or y in the test then 

14: RaiseV(v!, min(L[x], L[y])) 

15: else if edge ^\ 6 E e then 

16: RaiseV(vi, L[v ]) 

17: if edge e 2 G E c then /* False branch */ 

18: RaiseV(v 2 , L[v ]) 

19: 
20: /* Obvious generalization applies for tests like "x / y " */ 

Algorithm 6.3: A revised Visit procedure for SCC/SSI. 



61 




float double int long String 




Figure 6.3: SCC value lattice extended to Java primitive value domain. 
SCC/SSA: 0(E + Ussa), by the same argument as before. 

6.2.3 Extending the value domain 

The first simple extension of the SCC value lattice enables us to represent 
floating-point and other values. For this work, we extended the domain 
to cover the full type system of Java bytecode [15]; the extended lattice is 
presented in Figure 6.3. The figure also introduces the abbreviated lattice 
notation we will use through the following sections; it is understood that 
the lattice entry labelled "int" stands for a finite-but-large set of incom- 
parable lattice elements, consisting (in this case) of the members of the 
Java int integer type. Java ints are 32 bits long, so the "int" entry ab- 
breviates 2 32 lattice elements. Similarly, the "double" entry encodes not 
the infinite domain of real numbers, but the domain spanned by the Java 
double type which has fewer than 2 64 members. 16 The Java String type is 
also included, to allow simple constant string coalescing to be performed. 
The propagation algorithm over this lattice is a trivial modification to Al- 
gorithm 6.3, and will be omitted for brevity. In the next sections, the 
"int" and "long" entries in this lattice will be summarized as "Integer Con- 



16 In IEEE-standard floating-point, some possible bit patterns are not valid number 
encodings. 
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Hierarchy 


Source language 


Classes 


Avg. depth 


Max. depth 


FLEX infrastructure 


Java 


550 


1.9 


5 


javac compiler 


Java 


304 


2.8 


7 


NeXTStep 3.2+ 


Objective- C 


488 


3.5 


8 


Objectworks 4.1+ 


Smalltalk 


774 


4.4 


10 


t indicates data obtainec 


from Muthukrishnan and Miiller [281. 





Table 6.2: Class hierarchy statistics for several large 0-0 projects. 

stant" , the "float" and "double" entries as "Floating-point Constant" , and 
the "String" entry as "String Constant". As the lattice is still only three 
levels deep, the asymptotic runtime complexity is identical to that of the 
previous algorithm. 

6.2.4 Type analysis 

In Figure 6.4 we extend the lattice to compute Java type information. 
The new lattice entry marked "Typed" is actually forest-structured as 
shown in Figure 6.5; it is as deep as the class hierarchy, and the roots 
and leaves are all comparable to T and _L. Only the Visit procedure must 
be modified; the new procedure is given as Algorithm 6.4. Because the lat- 
tice L is deeper, the asymptotic runtime complexity is now 0(E + U S saD c ) 
where D c is the maximum depth of the class hierarchy. To form an esti- 
mate of the magnitude of D c , Table 6.2 compares class hierarchy statistics 
for several large object-oriented projects in various source languages. Our 
FLEX compiler infrastructure, as a typical Java example, has an average 
class depth of 1.91. 17 In a forced example, of course, one can make the class 
depth O(N); however, one can infer from the data given that in real code 
the D c term is not likely to make the algorithm significantly non-linear. 



17 Measured August 2, 1999; the infrastructure is under continuing development. 
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String 
Constant 



T 



Typed 



Floating-point 
Constant 



_L 



Integer 
Constant 



Figure 6.4: SCC value lattice extended with type information. 




fTyped 



java.lang. Object non-void primitive types 




java.lang. Number java.lang. String 




iD c lattice 

levels 



Null 
Constant 



Integer 
Constant 



Figure 6.5: "Typed" category of Figure 6.4 shown expanded. 
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Visit (runode) = 

1: for each assignment "v <— x © tj" inn do 

2: RaiseV(v, V[x] © V[y]) /* binop rule: see figure 6.6 */ 

3: 

4: for each assignment "v «- MEM(. . .)" or "v <- CALL(. . .)" in n do 

5: let t be the type of the MEM or CALL expression 

6: RaiseV(v, t) 

7: 

8: for each assignment "v <— cp(xi , . . . ,x n )" in n do 

9: for each variable X| corresponding to predecessor edge e^ of n do 
10: if e{ G E c then 

11: RaiseV(v, Ul{V[v], V[xiJ}) /* meet rule: use least upper bound */ 

12: 

13: for each branch "if x = y goto ei else ej" in n do 
14: if Typed C L[x] or Typed C L[y] then 
15: RaiseE(e!) 

16: RaiseE(e 2 ) 

17: else if L[x] = c and L[y] = d then /* ifx and y are constants. . . */ 
18: if c = d then 

19: RaiseE(ei) 

20: else 

21: RaiseE(e 2 ) 

22: for each assignment "(vi,V2) <— cr(vo)" associated with this branch do 
23: if edge ^\ 6 E e and variable vo is the x or y in the test then 

24: /* type error in source program if L[x] and L[y] are incomparable */ 

25: RaiseV(vi, min(L[x], L[y])) 

26: else if edge ^\ 6 E e then 

27: RaiseV(vi, L[v ]) 

28: if edge e 2 G E c then /* Faise branch */ 

29: RaiseV(v 2 , L[v ]) 

30: 

31: /* Obvious generalization applies for tests like "x / y " */ 
32: /* Obvious generalization applies for tests like "x instanceof C" */ 

Algorithm 6.4: Visit procedure for typed SCC/SSI. 
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int © int = int 

long ©{int, long} = long 

float © {int, long, float} = float 

double ©{int, long, float, double} = double 

String © {int, long, float, double, Object, .. .} = String 

Figure 6.6: Java typing rules for binary operations. 

A brief word on the roots of the hierarchy forest in Figure 6.5 is called 
for: Java has both a class hierarchy, rooted at java.lang. Object, and 
several primitive types, which we will also use as roots. The primitive 
types include int, long, float, and double. 18 Integer constants in the 
lattice are comparable to and less than the int or long type; floating-point 
constants are likewise comparable to and less than either float or double. 
String constants are comparable to and less than the java.lang. String 
non-primitive class type. 

The void type, which is the type of the expression null, is also a prim- 
itive type in Java; however we wish to keep x n y identical to |_li_{ x > v) (the 
least upper bound of x and y) while satisfying the Java typing rule that 
null n x = x when x is a non-primitive type and not a constant. This 
requires putting void comparable to but less than every non-primitive leaf 
in the class hierarchy lattice. 

The Java class hierarchy also includes interfaces, which are the means 
by which Java implements multiple inheritance. Base interface classes 



18 In the type system our infrastructure uses (which is borrowed from Java bytecode) 
the char, boolean, short and byte types are folded into int. 
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T 



Typed 




Fixed-length Integer Floating-point String Null 

Array Constant Constant Constant Constant 




Figure 6.7: Value lattice extended with array and null information. 

(which do not extend other interfaces) are additional roots in the hierarchy- 
forest, although no examples of this are shown in Figure 6.5. 

Since untypeable variables are generally forbidden, no operation should 
ever raise a lattice value above "Typed" to T. The otherwise-unnecessary 
T element is retained to indicate error conditions. 

This variant of the constant-propagation algorithm allows us to elim- 
inate unnecessary instanceof checks due to type-casting or type-safety 
checks. Section 6.2.6 will provide experimental validation of its utility. 

Finally, note that the ability to represent null as the void type in the 
lattice begins to allow us to address null-pointer checks, although because 
null n x = x for non-primitive types we can only reason about variables 
which can be proven to be null, not those which might be proven to be 
non-null (which is the more useful case). The next section will provide a 
more satisfactory treatment. 

6.2.5 Addressing array-bounds and null-pointer checks 

At this point, we can expand the value lattice once more to allow elim- 
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VU fc U13.SS, l~ non - nu ll I — I— possibly-null 

VC e Class non . nu n, |_|i_{ void > C} e ClasSp OSS ibiy-nuii 
VC e ClasSpossibiy-nuii, void C C 
VC e Class n on-nuii, (void, C) ^ C 

Let A(C,n) be a function to turn a lattice entry representing a non-null 
array class type C into the lattice entry representing a said array class with 
known integer constant length n. Then for any non-null array class C and 
integers i and j, 

A(C,i) c C 

(A(C,i),A(C,j)) e: if and only if i = ) 

Figure 6.8: Extended value lattice inequalities. 



x = 5 + 6; 
do { 

y = new int [x] ; 

z = x-1; 

if (0 <= z && z < y. length) 

y[z] =0; 
else 

x— ; 
} while (P); 

Figure 6.9: An example illustrating the power of combined analysis. 
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Visit (n:node) = 
1: /* Binop and <$- function rules as in algorithm 6.4 * / 



2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 



for each assignment "v <- MEM(. . .)" or "v f- CALL(. . .)" in n do 
let t 6 ClasSp OSS ibiy-nuii U ClasSprimitive be the type of the MEM or CALL 
RaiseV(v, t) 

for each array creation expression "v <— new T[x]" do 
if L[x] is an integer constant then 

RaiseV(v, A(T,L[x])) 
else 

RaiseV(v, T non . nu n) 

for each array length assignment "v <— arraylength(x)" do 
if L[x] is an array of known constant length n then 

RaiseV(v, n) 
else 

RaiseV(v, int) 

/* Branch rules as in algorithm 6.4, with the obvious extension to allow tests 
against null to lower a lattice value from Ciass posS j{,jy. nu jj to Class non _ nu u. */ 

Algorithm 6.5: Visit procedure outline with array and null information. 
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if (10 < 0) 

throw new NegativeArraySizeExceptionO ; 
int [] A = new int [10] ; 
if (0 < I I >= A .length) 

throw new ArraylndexOutOfBoundsExceptionO ; 
A[0] = 1; 
for (int i=l; i < 10; i++) { 

if (i < I I i >= A .length) 

throw new ArraylndexOutOfBoundsExceptionO ; 

A[i] = 0; 



Figure 6.10: Implicit bounds checks (underlined) on Java array references. 

ination of unnecessary array-bounds and null-pointer checks, based on our 
constant-propagation algorithm. The new lattice is shown in Figure 6.7; we 
have split the "Typed" lattice entry to enable the algorithm to distinguish 
between non-null and possibly-null values, 19 and added a lattice level for 
arrays of known constant length. Some formal definition of the new value 
lattice can be found in Figure 6.8; the meet rule is still the least upper 
bound on the lattice. Modifications to the Visit procedure are outlined 
in Algorithm 6.5. Notice that we exploit the pre-existing integer-constant 
propagation to identify constant-length arrays, and that our integrated ap- 
proach allows one-pass optimization of the program in Figure 6.9. 

Note that the variable renaming performed by the SSI form at control- 
flow splits is essential in allowing the algorithm to do null-pointer check 
elimination. However, the lattice we are using can remove bound checks 
from an expression A[k] when k is a constant, but not when k is an bounded 



19 Values which are always-null were discussed in the previous section; they are identified 
as having primitive type void. 
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induction variable. In the example of Figure 6.10 on the preceding page, 
the first two implicit checks are optimized away by this version of the 
algorithm, but the loop-borne test is not. 

A typical array-bounds check (as shown in the example on the pre- 
ceding page) verifies that the index i of the array reference satisfies the 
condition < i < n, where n is the length of the array. 20 By identifying 
integer constants as either positive, negative, or zero the first half of the 
bounds check may be eliminated. This requires a simple extension of the 
integer constant portion of the lattice, outlined in Figure 6.11 on the facing 
page, with negligible performance cost. However, handling upper bounds 
completely requires a symbolic analysis that is out of the current scope 
of this work. Future work will use induction variable analysis and inte- 
grate an existing integer linear programming approach [36] to fully address 
array-bounds checks. 

6.2.6 Experimental results 

The full SPTC analysis and optimization has been implemented in the 
FLEX Java compiler platform. 21 Some quantitative measure of the utility of 
SPTC is given as Figure 6.12. The "run-times" given are intermediate rep- 
resentation dynamic statement counts generated by the FLEX compiler SSI 
IR interpreter. The FLEX infrastructure is still under development, and its 
backends are not stable enough to allow directly executable code. As such, 
the numbers bear a tenuous relation to reality; in particular branch delays 
on real architectures, which the elimination of null-pointer checks seeks to 
eliminate, are unrepresented. Furthermore, the intermediate representa- 
tion interpreter gives the same cycle-count to two-operand instructions as 



20 Languages in which array indices start at 1 can be handled by slight modifications to 
the same techniques. 

21 See section 8 for details of methodology 
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T 
(MZP) 




(MZ-) (-ZP) (M-P) 




Figure 6.11: An integer lattice for signed integers. A classification into 
negative (M), positive (P), or zero (Z) is grafted onto the standard fiat 
integer constant domain. The (M-P) entry is duplicated to aid clarity. 



to loading constants, which tends to negate most of the benefit of constant 
propagation. As is obvious from the figure, the standard Wegman-Zadeck 
SCC algorithm, which has proven utility in practice, shows no improvement 
over unoptimized code due to the metric used. Even so, SPTC shows a 10% 
speed-up. It is expected that the improvement given in actual practice will 
be greater. 



Note that the speed-up is constant despite widely differing test cases. 
The "Hello world" example actually executes quite a bit of library code 
in the Java implementation; this includes numerous element-by-element 
array initializations (due to the semantics of Java bytecode) which we expect 
SPTC to excel at optimizing. But SPTC does just as well on the full FLEX 
compiler (68,032 lines of source at the time the benchmark was run), which 
shows that the speed-up is not limited to constant initialization code. 
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Figure 6.12: SPTC optimization performance. 



6.3 Bit-width analysis 

The SPTC algorithm can be extended to allow efficient bit-width analy- 
sis. Bit-width analysis is a variation of constant propagation with the goal 
of determining value ranges for variables. In this sense it is similar to, but 
simpler than, array-bounds analysis: no symbolic manipulation is required 
and the value lattice has N levels (where N is the maximum bitwidth of 
the underlying datatype) instead of 2 N . For C and Java programs, this 
means that only 32 levels need be added to the lattice; thus the bit-width 
analysis can be made efficient. 

Bit- width analysis allows optimization for modern media-processing in- 
struction set extensions which typically offer vector processing of limited- 
width types. Intel's MMX extensions, for example, offer packed 8-bit, 16- 
bit, 32-bit and 64-bit vectors [30]. To take advantage of these functional 
units without explicit human annotation, the compiler must be able to 
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-<M,P) = (P,M) 

(M l ,P l ) + (M r ,P r ) = (l+maxlM^M^J+maxlP^P,)) 

(M l ,P l )x(M r ,P T ) = (max(M l + P r ,P l +M r ),max(M l + M r ,P l + P r )) 

(O.POA^.Pr) = (O.min^P,.)) 

(M L , ?i) A (M r , P T ) = (maxfMuM^maxtP^P,)) 

Figure 6.13: Some combination rules for bit-width analysis. 



guarantee that the data in a vector can be expressed using the limited 
bit-width available. A simpler bit-width analysis in a previous work [3] 
showed that a large amount of width-limit information can be extracted 
from appropriate source programs; however, that work was not able to in- 
telligently compute widths of loop-bound variables due to the limitations 
of the SSA form. Extending the bitwidth algorithm to SSI form allows 
induction variables width-limited by loop-bounds to be detected. 

Bit-width analysis is also a vital step in compiling a high-level language 
to a hardware description. General purpose programming languages do not 
contain the fine-grained bit-width information that a hardware implemen- 
tation can take advantage of, so the compiler must extract it itself. The 
work cited showed that this is viable and efficient. 

The bit-width analysis algorithm has been implemented in the FLEX 
compiler infrastructure. Because most types in Java are signed, it is neces- 
sary to separate bit-width information into "positive width" and "negative 
width." This is just an extension of the signed value lattice of Figure 6.11 to 
variable bit-widths. In practice the bit-widths are represented by a tuple, 
extending the integer constant lattice with [Int x Int)± under the natural 
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total ordering of Int. The tuple (0, 0) is identical to the constant 0, and the 
tuple (0, 16) represents an ordinary unsigned 16-bit data type. The T ele- 
ment is represented by an appropriate tuple reflecting the source-language 
semantics of the value's type. Figure 6.13 presents bit-width combina- 
tion rules for some unary negation and binary addition, multiplication and 
bitwise-and. In practice, the rules would be extended to more precisely 
handle operands of zero, one, and other small constants. 



7 An executable representation 

The Static Single Information (SSI) form, as presented in the first half 
of this thesis, requires control-flow graph information in order to be exe- 
cutable. We would like to have a demand-driven operational semantics for 
SSI form that does not require control-flow information; thus freeing us to 
more flexibly reorder execution. 

In particular, we would like a representation that eliminates unnecessary 
control dependencies such as exist in the program of Figure 7.1 on the next 
page. A control-flow graph for this program, as it is written, will explicitly 
specify that no assignments to B [] will take place until all elements of A [] 
have been assigned; that is, the second loop will be control- dependent on 
the first. We would like to remove this control dependence in order to 
provide greater parallelism — in this case, to allow the assignments to A[] 
and B [] to take place in parallel, if possible. 

In addition, an executable representation allows us to more easily apply 
the techniques of abstract interpretation [31]. Although abstract interpre- 
tation may be applied to the original SSI form using information extracted 
from the control flow graph, an executable SSI form allows more concise 
(and thus, more easily derived and verified) abstract interpretation algo- 
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for (int i=0 
A[i] = x 

for (int j=0 
BEj] = y 



i<10; i++) 
j<10; j++) 



Figure 7.1: An example of unnecessary control dependence: the second 
loop is control- dependent on the first and so assignments to A[] and B[] 
cannot take place in parallel. 



rithms. 

The modifications outlined here extend SSI form to provide a useful and 
descriptive operational semantics. We will call the extended form SSI + . 
For clarity, SSI form as originally presented we will call SSI . We will 
describe algorithms to contruct SSI + efficiently, and illustrate analyses and 
optimizations using the form. 

7.1 Deficiencies in SSIo 

Although a demand-driven execution model can be constructed for SSI , it 
fails to handle loops and imperative constructs well. SSI + form addresses 
these deficiencies. 

7.1.1 Imperative constructs, pointer variables, and side-effects 

The presentation of SSI ignored pointers, concentrating on so-called regis- 
ter variables. Extending SSI to handle these imperative constructs is quite 
easy: we simply define a "variable" S to represent an updatable store. This 
variable is renamed and numbered as before, so that S represents the initial 
contents of the store and S^i > represents the contents of the store after 
some sequence of writes. Figure 7.2 shows a simple imperative program in 
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// swap A[i] and B[j] 
x = A[i] 

y = B[j] 

A[i] = y 
B[j] = x 



// SSI+ form: 

x = FETCH (S , A + i ) 

y Q = FETCH (S 5 B + j ) 

Si = STORE (S 0j A + i 0j Vo) ; 

S 2 = STORE (St , B + j 0j x ) ; 



Figure 7.2: Use of the "store variable" S x in SSI + form. 

SSI+ form. Note that modifications to the store typically take the previous 
contents of the store as input, and that subroutines with side-effects mod- 
ifying the store must be written in SSI + form such that they both take a 
store and return a store. 

The single monolithic store may provide aliasing at too coarse a resolu- 
tion to be useful. Decomposing the store into smaller regions is a straight- 
forward application of pointer analysis, which may benefit from an initial 
conversion of register variables to SSI form. In type-safe languages, defin- 
ing multiple stores for differing type sets is a trivial implementation of basic 
pointer analysis; Figure 7.3 shows a simple example of this form of decom- 
position using two different subtypes (Integer and Float) of a common 
base class (Number). Pointer analysis is a huge and rapidly-growing field 
which we cannot attempt to summarize here; suffice to say that the may- 
point-to relation from pointer analysis may be used to define a fine-grained 
model of the store. 

Proper sequencing among statements with side-effects may be handled 
in a similar way: a special SSI name is used/defined where side-effects occur 
to impose an implicit ordering. For maximum symmetry with the 'store' 
case, we will name this special variable S fx . This variable may be further 
decomposed using effect analysis for more precision. 

Note that precise analysis of side-effects and the store is much more 
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h N : Number, I : Integer, F : Float 

I C N and F c N 



if(P) 

N=I; 
else 

N=F; 
F. add (3. 14159); 
N.add(5); 



SSI+ form: 



N = (J)(I ,F ) 



S^ = CALL(add,S£,F ,3.14159) 
S] , S T 2 ) = CALL(add, S l , Sf, N , 5) 



Figure 7.3: Factoring the store (S x ) using type information in a type-safe 
language. 

important in C-like languages. The example on the left in Figure 7.4 shows 
the difficulties one may encounter in dealing with pointer variables that 
may rewrite SSI temporaries. It is possible to deal with this in the manner 
of Figure 7.3 using explicit stores, and with sufficient analysis one may write 
the SSI representation on the right in the figure. The source language for 
our FLEX compiler does not encounter this difficulty: Java has no pointers 
to base types, and so the compiler does not have to worry about values 
changing "behind its back" as in the example. 

7.1.2 Loop constructs 

The center column of Figure 7.5 on page 79 shows a typical loop in SSI 
form. Note first that an explicit "control flow" expression (goto LI) is 
required in order to make sense of the program. Note also that ii , 12 and 
i 3 are potentially dynamically assigned many times, although statically 
they have only one definition each. This complicates any sort of demand- 
driven semantics: should the 4>-function demand the value of io, or i 3 , 
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int x=l; 
int y=2; 
int *p = &x; 

if (P) 

p = &y; 

*p = 3; 
return x; 



x = 1 

Vo =2 

Tpo = M // P is of type "location set" 

v^ ={y) 

P2 = <MPo,Pl) 

(xi.ui) =DEREF(p 2 ,3) 
return Xi 



Figure 7.4: Pointer manipulation of local variables in C. 



when it is evaluated the first time? Which of the values of is does it receive 
when the 4>-function is subsequently evaluated? A token-based dataflow 
interpretation fails as well: it is easy to see that tokens for i* flow around 
the loop before flowing out at the end, but the token for j seems to be 
"used up" in the first iteration. 

SSI+ introduces a £,-function in the block of 4>-functions to clarify the 
loop semantics. The left-hand column of Figure 7.5 illustrates the nature of 
this function. The £,-function arbitrates loop iteration, and will be defined 
precisely by the operational semantics of SSI+ form. For now note that 
it relates iteration variables (the top tuple of the parameter and result 
vectors) to loop invariants (the bottom tuple of the vectors). We followed 
the statement ordering of SSI in the figure, but unlike SSI , the statements 
of SSI + could appear in any order without affecting their meaning — and so 
the statement label LI of the SSI representation and its implicit control- 
flow edge are unnecessary in SSI+. 
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// a simple loop 

i=0; 

do 
{ 

} while (i<5) ; 



// SSI form: 



LI: 



12 = ii + jo 
Po = (12 < 5) 
if P goto LI 

(13,14) = cr[i 2 ) 



// SSI+ form: 
1o = 1 

i-i = * (io.i-s) 
12 = ii +ji 
Po = (12 < 5) 
(13,14) = o-(P ,i 2 ) 



Figure 7.5: A simple loop, in SSI and SSI + forms. 

7.2 Definitions 

The signature characteristic of SSI + are the £,-functions. These £,-functions 
exist in the same places ^-functions do, and control loop iteration. The 
exact semantics may vary — the sections below present two different valid 
semantics for a £,- functions — but informally they can be viewed as "time- 
warp" operators. They take values from the "past" (previous iterations of 
the loop or loop invariants valid when the loop began) and project them 
into the "future" (the current loop iteration). 

There is at most one £,-function per 4>-function block, and it always 
precedes the ^-functions. Construction of £,-functions takes place before 
the renaming step associated with SSI form, and the £,-functions are then 
renamed in the same manner as any other definition. The top tuple of 
the constructed £,-function contains the names of all variables reaching 
the guarded (J)-function via a backedge, and the bottom tuple contains 
all variables used inside the guarded loop that are not mentioned in the 
header's 4>-function. 
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The SSI + form also has triggered constants. The time-oriented se- 
mantics of SSI + dictate that each constant must be associated with a trigger 
specifying for what times (cycles/loop iterations) the value of the constant 
is valid. These are similar to the constant generators in some dataflow 
machines [42]. The triggers for a constant c come from the variables de- 
fined in the earliest applicable instruction post-dominated by the constant 
definition statement v = c. This is designed to generate the trigger as 
soon as it is known that the constant definition statement will always ex- 
ecute. In practice it is necessary to introduce a bogus trigger variable, 
Cj which is generated at the START node and used to trigger any constants 
otherwise without a suitable generator. If the use of the constant does not 
post-dominate the START node, C T will have to be threaded through 4>- and 
cr-functions to reach the earliest post-dominated node. 

7.3 Semantics 

We will base the operational semantics of SSI + on a demand-driven data- 
flow model. We will define both a cycle-oriented semantics and an event- 
driven semantics, which (incidentally) correspond to synchronous and asyn- 
chronous hardware models. 

Following the lead of Pingali [31], we present Plotkin-style semantics 
[33] in which configurations are rewritten instead of programs. The con- 
figurations represent program state and transitions correspond to steps in 
program execution. The set of valid transitions is generated from the pro- 
gram text. 

The semantics operate over a lifted value domain V = Int±. When 
some variable t = ±y we say it is undefined] conversely t Zl ±y indicates 
that the variable is defined. "Store" metavariables S x are not explicitly 
handled by the semantics, but the extension is trivial with an appropriate 
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redefinition of the value domain V. Floating-point and other types are also 
trivial extensions. The metavariables c and v stand for elements of V. 

We also define a domain of variable names, Nam = {n ,ni, . . .}. The 
metavariables t and P stand for elements in Nam, although P will be re- 
served for naming branch predicates. 

A fixed set of "built-in" operators, op, is defined, of type V* — > V. If any 
operator argument is _L, the result is also _L. Constants are implemented 
as a special case of the general operator rule: an op producing a constant 
has a single trigger input which does not affect the output. 

7.3.1 Cycle-oriented semantics 

In the cycle-oriented semantics, configurations consist of an environment, 
p, which maps names in Nam to values in V. 

Definition 7.1. 

1. An environment p : N — > V is a finite function — its domain N C 
Nam is finite. The notation p[t i— > c] represents an environment 
identical to p except for name t which is mapped to c. 

2. The null environment p0 maps every t e N to _L V - 

3. A configuration consists of an environment. The initial config- 
uration is P0[C T — > 0] extended with mappings for procedure pa- 
rameters. That is, all names in N are mapped to _L V except for 
the default constant trigger Cj mapped to 0, 22 and any procedure 
parameters mapped to their proper entry values. 

Figure 7.6 on the next page shows the cycle-oriented transition rules for 
SSI + form. The left column consists of definitions and the right column 



2 Any kDly would do. 
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t = op(t 1 ,...,t n ) : 



t = 4>(ti,... ,t n ) : 



(t 1 ,...,t n ) = (T(P,t) 



[ <tl,..,tn> 

L<t n+1 ,...,t„ 



U 



(t;,..,t;) 
(t; +1 ,-,t^) 



p[t] = JL A (p[ti] Z JL A . . . A p[t n ] Z Z) 




p -> p[t i-> op(p[ti], . . . , p[t n ])] 
p[t] = Z A p[tj] Z Z A all other p[ti], . . . , p[t n ] = 


= _L 


p -> p[t i-» p[tj]] 
p[P] = v Z Z A p[tv_,] = Z A p[t] Z Z 



P -> p[tv-l l-> p[t]] 

where (0 < V < TL — 1 

p[tj]=ZAp[t(1zZ 
p -> p[tj i-» p[t/]] 

where (1 < j < n) 



[ <tl,..,tn> 

L<t n+1 ,...,t„ 



t( 



(t; +1 ,...,^) 



p[t; +1 1 zZA...Ap[t^]zZ 

P -» P0[tl H-» p[tl]] . . . [t n h-» p[t n ]] 

[tn+1 H P [t; +1 ]]...[t m Hp[t; 



Figure 7.6: Cycle-oriented transition rules for SSI + . 

shows a precondition on top of the line, and a transition below the line. 
If the definition in the left column is present in the SSI + form and the 
precondition on top of the line is satisfied, then the transition shown below 
the line can be performed. 

7.3.2 Event-driven semantics 

In the event-driven semantics, configurations consist of an event set and an 
invariant store. The event set E contains definitions of the form t = c, and 
the invariant store is a mapping from numbered £,-functions in the source 
SSI+ form to a set of tuples representing saved values for loop invariants. 
We define the following domains: 
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t = op(t 1 ,...,t n ) : 



(E[t, =v 1 ]...[t n =vJ,S)-»(E[t = op(v 1 ,...,v n )],S) 



t = 4>(tl,. •• ,tr 



<E[t i =v],S)->(E[t = v],S) 



(t 1 ,...,t n ) = ff(P,t) 



:<i t ;;::5i>i= f *( 



<tl,..,tn> I _ r / 
.<t n+1 ,...,t m )l -^Kl 



(t; + i.-,t{a) 



(E[t=v][P=i],S)->(E[t t =v],S) 

<E[t?=v],S)-> 

(E[t i =v],S[Kh-)S[K]U(t i ,v)]) 
where 1 < i < n 

S[K]={(t 1 ,v 1 ),...,(t n ,v n )} 
(E[t; +1 = v n+1 ] . . . [t^ = v m ] , S) -> 

(E[t! =v 1 ]...[t m =v m ],S) 



Figure 7.7: Event-driven transition rules for SSI + . In the last two rules K is 
a statement-identifier constant which is unique for each source £,-function. 



• Evt = Nam x V is the event domain. An event consists of a name- 
value pair. The metavariable e stands for elements of Evt. 

• Xif c Int is used to number £,-functions in the source SSI + form. 
There is some mapping function which relates £,-functions to unique 
elements of Xif. The metavariable K stands for an element in Xif. 

A formal definition of our configuration domain is now possible: 

Definition 7.2. 

1. An event set E : Evt*. The notation E[t = c] represents an event 
set identical to E except that it contains the event (t, c). We say a 
name t is defined if (t,v) e E for some v. For all (ti,Vi) , {t 2 ,v 2 ) G 
E, ti and t 2 differ. This is equivalent to saying that no name t is 
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multiply defined in an event set. This constraint is enforced by 
the transition rules, not by the definition of E. 

2. An invariant store S : Xif — > Evt* is a finite mapping from £,- 
functions to event sets. 

3. A configuration is a tuple (E,S) : Evt* x (Xif — > Evt*) consisting of 
an event set and an invariant store. The initial configuration for 
procedure parameters ipo,. . . ,p n mapped to non-1, values vo, . . . ,v n 
is {{C T = 0,p = vo, ... ,p n = v n } B vt, Dxif-^Bvt*) that is, it consists of 
an empty event set extended with events for default constant trig- 
ger Cj and the procedure parameters, and an empty mapping for 
the invariant store. 

Figure 7.7 on the preceding page shows the event-driven transition rules 
for SSI+ form. As before, the left column consists of definitions and the 
right column shows an optional precondition above a line, and a transition. 
If the definition in the left column is present in the SSI + form and the 
precondition (if any) above the line is satisfied, then the transition can be 
performed. Note that most transitions remove some event from the event 
set E, replacing it with a new event. The invariant store S stores the values 
of loop invariants for regeneration at each loop iteration. 

7.4 Construction 

Construction of SSI+ is only a slight variation on the construction algo- 
rithms for SSI . First, dominator and post-dominator trees are produced 
using the Lengauer-Tarjan [25] or Harel [16] algorithm. The nodes of 
the dominator tree are numbered in pre-order such that for all nodes N, 
num[N] > num[idom[N]]. Then, in a single traversal of the post-dominator 
tree, we find the lowest-numbered node post-dominated by any given node. 
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We add triggers to constants from variables defined at this lowest node 
post-dominated by the constant use; using the default trigger C T where 
necessary. We then place ct>- and cr-functions for all variables, including 
constant triggers, using Algorithm 5.3. 

We then generate £,-functions. A standard interval analysis creates a 
loop nesting tree, and each loop is scanned for invariants and other defini- 
tions/uses to create the proper £,-function tuples. Renaming is done using 
Algorithm 5.4, as before. 



7.5 Dataflow and control dependence 

The SSI+ semantics are data-driven, and thus bring to mind work on com- 
pilers for dataflow machines. Beck, Johnson, and Pingali have previously 
written [6] on the benefits of dataflow-oriented intermediate representa- 
tions. However, the previous work on dataflow compilers (Traub [42], for 
example) has concentrated on intra-loop dependencies, often leaving in 
pseudo-control-flow edges to serialize non-loop structures. This strategy 
results in the sort of fine-grain intra-loop parallelism suitable for parallel 
dataflow machines, vector processors, and VLIW machines. 

The current work concentrates on removing unnecessary dependencies 
between loops, which allows a coarser parallelism which does not require 
as many functional units to take advantage of. Moreover, we extract par- 
allel sequential threads that are not loop-based. Obviously both fine-grain 
and coarse-grain parallelism are important, but we feel the current indus- 
try trends towards loosely coupled multiprocessors support our coarser- 
grained approach which has, to date, seemingly been neglected by dataflow 
approaches. 
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7.6 Hardware compilation. 

The observant reader may have noticed that the two operational semantics 
given in section 7.3 closely resemble circuit implementations for the pro- 
gram according to synchronous and asynchronous design methodologies. 
In fact, SSI+ was designed specifically to facilitate rendering a high-level 
program into hardware. The two semantics differ primarily on how cyclic 
dependencies (i.e. loops) are handled. 

Translation of high-level languages directly to hardware has long been 
a goal of researchers. Tanaka et al. constructed a system based on FOR- 
TRAN [41], and Galloway's C-based hardware description language [13] 
inspired a new interest in applying general-purpose languages to the task. 
The recent general use of type-safe object-oriented languages has encour- 
aged speculation that the more favorable analysis properties of these stricter 
languages would enable further advances in general-use hardware compila- 
tion. In this context, the well-defined semantics and data-flow orientation 
of SSI + solve the local-level hardware compilation problem and allow effort 
to be concentrated on the more difficult intra-procedural analyses required. 

8 Methodology 

The SSI intermediate representation described in this paper is the core IR 
for the FLEX compiler infrastructure project, started in July 1998 and 
currently containing about 70,000 lines of Java source code. The FLEX 
compiler reads in Java bytecodes, and targets both the JVM (for high-level 
portable code transformations) and several combinations of machine archi- 
tectures and runtime systems. Currently the bytecode and ARM processor 
backends are near completion. Interpreters exist for the various interme- 
diate representations used in the compiler, allowing the correctness of the 
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earlier passes of the compiler to be verified. The compiler will correctly 
compile itself to IR and interpret itself. 

The FLEX compiler implements the algorithms described in this paper, 
validating their correctness. Variable counting for the graphs of section 5.4 
was done by a special statistics module that could be applied to the results 
of any pass. The full bitwidth-extended SPTC constant propagation al- 
gorithm was implemented, although we currently do not use the bitwidth 
information produced. SSI+ and hardware compilation are the focus of 
current work. 

9 Conclusions 

The Static Single Information form extends SSA without adding unneeded 
complexity to allow efficient predicated analysis and backward dataflow 
analyses. Futher, the SSI + variant removes all explicit control-dependence 
relations, allowing extraction of parallelism from the code, and possesses a 
complete and straight-forward semantics which makes it useful for, among 
other things, abstract interpretation and hardware compilation. 

We have demonstrated efficient construction of SSI form, and several 
optimizations which use it to obtain efficiency improvements over previous 
methods. The many SSA-variant papers in the literature attest to limi- 
tations of standard SSA form; we believe SSI form solves these problems 
in a simple and symmetric manner. The FLEX compiler infrastructure 
demonstrates the practicality of SSI form. 
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